pgvector
pgvector is an open-source PostgreSQL extension that adds a vector data type and similarity-search operators, allowing embeddings to be stored and queried directly inside a relational database.
pgvector is an open-source extension for PostgreSQL that adds support for storing and searching vector embeddings directly within a relational database. It introduces a dedicated vector column type along with operators and index structures for similarity search, allowing developers to keep embeddings alongside their existing relational data rather than running a separate, specialised vector database. This makes pgvector a popular choice for teams that already use PostgreSQL and want to add semantic capabilities without adopting new infrastructure.
How it works
pgvector adds a vector column type that holds an array of floating-point numbers representing an embedding. It provides distance operators for the three most common similarity measures: cosine distance, Euclidean (L2) distance, and inner product. A typical query selects rows ordered by the distance between a stored vector and a query vector, returning the nearest neighbours. Because this all happens within standard SQL, vector similarity can be combined naturally with ordinary filters, joins, and aggregations, producing context-aware results in a single query.
For example, an application can retrieve the most similar documents to a query embedding while simultaneously filtering on a category column, a date range, or a user identifier, joining the result against other tables as needed. This tight integration with relational features is the principal advantage of pgvector over standalone vector stores for many applications.
Indexing and performance
To make search fast at scale, pgvector supports two approximate-nearest-neighbour index types. IVFFlat partitions vectors into lists and searches only the most relevant partitions, while HNSW builds a navigable graph that generally offers better recall and query speed at the cost of more memory and slower index construction. With HNSW indexing, pgvector can handle millions of vectors, and published benchmarks report query times under roughly 20 milliseconds at one million vectors with recall above 95 percent, which is sufficient for many production workloads.
The 0.7.0 release and subsequent versions expanded pgvector's capabilities, including support for additional vector representations and improved indexing, reflecting steady development driven by the surge in demand for embedding storage.
Use cases and ecosystem
pgvector enables similarity and semantic search, retrieval-augmented generation, image search, recommendation systems, and other natural-language and computer-vision applications. It has been widely adopted because PostgreSQL is one of the most common databases in production, and major managed platforms support the extension. Supabase, Microsoft Azure Database for PostgreSQL, Amazon RDS and Aurora, and Google Cloud SQL all offer pgvector, lowering the barrier to adding vector search to existing systems.
A frequently cited argument in favour of pgvector is operational simplicity. For applications whose vector counts reach the millions rather than the billions, keeping embeddings in the same database as the rest of the data avoids the cost and complexity of synchronising a separate system, which is why some practitioners argue that many teams do not need a dedicated vector database at all. For the largest or most demanding workloads, purpose-built systems such as Milvus or Qdrant may still be preferable.
References
- pgvector. (2026). pgvector GitHub Repository. https://github.com/pgvector/pgvector
- PostgreSQL. (2024). pgvector 0.7.0 Released. https://www.postgresql.org/about/news/pgvector-070-released-2852/
- Supabase. (2026). pgvector: Embeddings and vector similarity.
- Encore. (2025). pgvector Guide: Vector Search and RAG in PostgreSQL.