AI & Vectors

Choosing your Compute Add-on

Choosing the right Compute Add-on for your vector workload.


You have two options for scaling your vector workload:

  1. Increase the size of your database. This guide will help you choose the right size for your workload.
  2. Spread your workload across multiple databases. You can find more details about this approach in Engineering for Scale.

Dimensionality

The number of dimensions in your embeddings is the most important factor in choosing the right Compute Add-on. In general, the lower the dimensionality the better the performance. We've provided guidance for some of the more common embedding dimensions below. For each benchmark, we used Vecs to create a collection, upload the embeddings to a single table, and create both the IVFFlat and HNSW indexes for inner-product distance measure for the embedding column. We then ran a series of queries to measure the performance of different compute add-ons:

HNSW

384 dimensions

This benchmark uses the dbpedia-entities-openai-1M dataset containing 1,000,000 embeddings of text, regenerated for 384 dimension embeddings. Each embedding is generated using gte-small.

PlanVectorsmef_constructionef_searchQPSLatency MeanLatency p95RAM UsageRAM
Micro100,0001664605800.017 sec0.024 sec1.2 (Swap)1 GB
Small250,0002464604400.022 sec0.033 sec2 GB2 GB
Medium500,0002464803500.028 sec0.045 sec4 GB4 GB
Large1,000,00032801002700.073 sec0.108 sec7 GB8 GB
XL1,000,00032801005250.038 sec0.059 sec9 GB16 GB
2XL1,000,00032801007900.025 sec0.037 sec9 GB32 GB
4XL1,000,000328010016500.015 sec0.018 sec11 GB64 GB
8XL1,000,000328010026900.015 sec0.016 sec13 GB128 GB
12XL1,000,000328010039000.014 sec0.016 sec13 GB192 GB
16XL1,000,000328010042000.014 sec0.016 sec20 GB256 GB

Accuracy was 0.99 for benchmarks.

960 dimensions

This benchmark uses the gist-960 dataset, which contains 1,000,000 embeddings of images. Each embedding is 960 dimensions.

PlanVectorsmef_constructionef_searchQPSLatency MeanLatency p95RAM UsageRAM
Micro30,0001664654300.024 sec0.034 sec1.2 GB (Swap)1 GB
Small100,0003280602600.040 sec0.054 sec2.2 GB (Swap)2 GB
Medium250,0003280901200.083 sec0.106 sec4 GB4 GB
Large500,00032801201600.063 sec0.087 sec7 GB8 GB
XL1,000,00032802002000.049 sec0.072 sec13 GB16 GB
2XL1,000,00032802003400.025 sec0.029 sec17 GB32 GB
4XL1,000,00032802006300.031 sec0.050 sec18 GB64 GB
8XL1,000,000328020011000.034 sec0.048 sec19 GB128 GB
12XL1,000,000328020014200.041 sec0.095 sec21 GB192 GB
16XL1,000,000328020016500.037 sec0.081 sec23 GB256 GB

Accuracy was 0.99 for benchmarks.

QPS can also be improved by increasing m and ef_construction. This will allow you to use a smaller value for ef_search and increase QPS.

1536 dimensions

This benchmark uses the dbpedia-entities-openai-1M dataset, which contains 1,000,000 embeddings of text. And 224,482 embeddings from Wikipedia articles for compute add-ons large and below. Each embedding is 1536 dimensions created with the OpenAI Embeddings API.

PlanVectorsmef_constructionef_searchQPSLatency MeanLatency p95RAM UsageRAM
Micro15,0001640404800.011 sec0.016 sec1.2 GB (Swap)1 GB
Small50,00032641001750.031 sec0.051 sec2.2 GB (Swap)2 GB
Medium100,00032641002400.083 sec0.126 sec4 GB4 GB
Large224,48232641002800.017 sec0.028 sec8 GB8 GB
XL500,00024561003600.055 sec0.135 sec13 GB16 GB
2XL1,000,00024562505600.036 sec0.058 sec32 GB32 GB
4XL1,000,00024562509500.021 sec0.033 sec39 GB64 GB
8XL1,000,000245625016500.016 sec0.023 sec40 GB128 GB
12XL1,000,000245625019000.015 sec0.021 sec38 GB192 GB
16XL1,000,000245625022000.015 sec0.020 sec40 GB256 GB

Accuracy was 0.99 for benchmarks.

QPS can also be improved by increasing m and ef_construction. This will allow you to use a smaller value for ef_search and increase QPS. For example, increasing m to 32 and ef_construction to 80 for 4XL will increase QPS to 1280.

IVFFlat

384 dimensions

This benchmark uses the dbpedia-entities-openai-1M dataset containing 1,000,000 embeddings of text, regenerated for 384 dimension embeddings. Each embedding is generated using gte-small.

PlanVectorsListsProbesQPSLatency MeanLatency p95RAM UsageRAM
Micro100,000500502050.048 sec0.066 sec1.2 GB (Swap)1 GB
Small250,0001000601600.062 sec0.079 sec2 GB2 GB
Medium500,0002000801200.082 sec0.104 sec3.2 GB4 GB
Large1,000,0005000150750.269 sec0.375 sec6.5 GB8 GB
XL1,000,00050001501500.131 sec0.178 sec9 GB16 GB
2XL1,000,00050001503000.066 sec0.099 sec10 GB32 GB
4XL1,000,00050001505700.035 sec0.046 sec10 GB64 GB
8XL1,000,000500015014000.023 sec0.028 sec12 GB128 GB
12XL1,000,000500015015500.030 sec0.039 sec12 GB192 GB
16XL1,000,000500015018000.030 sec0.039 sec16 GB256 GB

960 dimensions

This benchmark uses the gist-960 dataset, which contains 1,000,000 embeddings of images. Each embedding is 960 dimensions.

PlanVectorsListsQPSLatency MeanLatency p95RAM UsageRAM
Micro30,00030750.065 sec0.088 sec1.1 GB (Swap)1 GB
Small100,000100780.064 sec0.092 sec1.8 GB2 GB
Medium250,000250580.085 sec0.129 sec3.2 GB4 GB
Large500,000500550.088 sec0.140 sec5 GB8 GB
XL1,000,00010001100.046 sec0.070 sec14 GB16 GB
2XL1,000,00010002350.083 sec0.136 sec10 GB32 GB
4XL1,000,00010004200.071 sec0.106 sec11 GB64 GB
8XL1,000,00010008150.072 sec0.106 sec13 GB128 GB
12XL1,000,000100011500.052 sec0.078 sec15.5 GB192 GB
16XL1,000,000100013450.072 sec0.106 sec17.5 GB256 GB

1536 dimensions

This benchmark uses the dbpedia-entities-openai-1M dataset, which contains 1,000,000 embeddings of text. Each embedding is 1536 dimensions created with the OpenAI Embeddings API.

PlanVectorsListsQPSLatency MeanLatency p95RAM UsageRAM
Micro20,000401350.372 sec0.412 sec1.2 GB (Swap)1 GB
Small50,0001001400.357 sec0.398 sec1.8 GB2 GB
Medium100,0002001300.383 sec0.446 sec3.7 GB4 GB
Large250,0005001300.378 sec0.434 sec7 GB8 GB
XL500,00010002350.213 sec0.271 sec13.5 GB16 GB
2XL1,000,00020003800.133 sec0.236 sec30 GB32 GB
4XL1,000,00020007200.068 sec0.120 sec35 GB64 GB
8XL1,000,000200012500.039 sec0.066 sec38 GB128 GB
12XL1,000,000200016000.030 sec0.052 sec41 GB192 GB
16XL1,000,000200017900.029 sec0.051 sec45 GB256 GB

For 1,000,000 vectors 10 probes results to accuracy of 0.91. And for 500,000 vectors and below 10 probes results to accuracy in the range of 0.95 - 0.99. To increase accuracy, you need to increase the number of probes.

Performance tips

There are various ways to improve your pgvector performance. Here are some tips:

Pre-warming your database

It's useful to execute a few thousand “warm-up” queries before going into production. This helps help with RAM utilization. This can also help to determine that you've selected the right instance size for your workload.

Finetune index parameters

You can increase the Requests per Second by increasing m and ef_construction or lists. This also has an important caveat: building the index takes longer with higher values for these parameters.

Check out more tips and the complete step-by-step guide in Going to Production for AI applications.

Benchmark methodology

We follow techniques outlined in the ANN Benchmarks methodology. A Python test runner is responsible for uploading the data, creating the index, and running the queries. The pgvector engine is implemented using vecs, a Python client for pgvector.

Each test is run for a minimum of 30-40 minutes. They include a series of experiments executed at different concurrency levels to measure the engine's performance under different load types. The results are then averaged.

As a general recommendation, we suggest using a concurrency level of 5 or more for most workloads and 30 or more for high-load workloads.