Vector Database
A specialised database system that stores data as high-dimensional numerical vectors and enables fast approximate nearest-neighbour search, forming the retrieval backbone of semantic search and RAG systems.
A vector database is a data management system designed to store, index, and query high-dimensional vectors — numerical representations of data objects such as text, images, audio, or code — with a primary operation of finding the vectors most similar to a given query vector. Unlike relational databases that retrieve rows matching exact field values, or full-text search engines that rank documents by keyword overlap, vector databases retrieve items by semantic or structural similarity in a continuous embedding space. They are a foundational component of modern AI infrastructure, enabling applications such as retrieval-augmented generation (RAG), semantic search, recommendation systems, anomaly detection, and multimodal retrieval.[^1]
Embeddings and Vector Representations
The utility of a vector database depends on the quality of the embeddings stored within it. An embedding is a dense numerical vector produced by a neural network — typically a transformer encoder — that maps an object (a sentence, a product description, a medical image, a piece of code) to a point in a high-dimensional space. Objects that are semantically or perceptually similar are mapped to nearby points; dissimilar objects are placed far apart.
Embedding dimensionality varies by model. OpenAI's text-embedding-3-large model produces 3,072-dimensional vectors. Sentence-BERT variants commonly produce 384- or 768-dimensional vectors. Image embeddings from CLIP models are typically 512 or 1,024 dimensions. The choice of embedding model determines the quality of the similarity relationships encoded in the vector space.
Indexing and Search Algorithms
Searching for the nearest neighbour to a query vector in a high-dimensional space is computationally challenging. An exact brute-force search comparing the query against every stored vector is accurate but has linear time complexity, becoming impractical at the scale of millions or billions of vectors.
Vector databases address this using Approximate Nearest Neighbour (ANN) algorithms, which trade a small reduction in recall for dramatic gains in query speed.
HNSW (Hierarchical Navigable Small World) is the most widely deployed ANN algorithm. It builds a multi-layer graph structure where nodes at higher layers represent coarser clusters and nodes at lower layers represent individual vectors. Search navigates from the top layer to the bottom, pruning the search space at each level. HNSW achieves query latency that grows logarithmically with dataset size and consistently delivers high recall (95–99%) for most workloads.[^2]
IVF (Inverted File Index) divides the vector space into clusters, assigns each vector to its nearest cluster centroid, and searches only the most relevant clusters at query time. IVF variants such as IVF-PQ (with product quantisation for compression) are commonly used for very large-scale deployments where memory efficiency is critical.
FAISS (Facebook AI Similarity Search) is an open-source library from Meta that implements multiple ANN algorithms and serves as the retrieval backend for many vector database products.
Similarity Metrics
Vector databases support multiple distance metrics to quantify similarity:
| Metric | Formula | Best for | |--------|---------|----------| | Cosine similarity | Normalised dot product | Text embeddings, direction matters | | Euclidean distance | L2 norm | Spatial data, magnitude matters | | Dot product | Raw dot product | Scaled embeddings (e.g., OpenAI text-embedding) | | Hamming distance | Bit-level XOR | Binary embeddings |
Most NLP applications use cosine similarity because it measures directional alignment in embedding space regardless of vector magnitude, making it robust to embedding normalisation differences.
Leading Vector Database Systems
The vector database market expanded rapidly from 2022 onward, producing a diverse set of specialised systems.
Pinecone is a fully managed cloud-native vector database. It abstracts all infrastructure management, offering a simple API for upsert and query operations. Pinecone introduced Dedicated Read Nodes in late 2024 for predictable performance at high query volumes, and its Pinecone Assistant product (generally available January 2025) bundles chunking, embedding, retrieval, and answer generation behind a single endpoint.[^3]
Weaviate is an open-source vector database with native hybrid search (combining vector similarity and BM25 keyword search), support for multimodal data types, and a modular architecture for plugging in embedding models directly. It scales horizontally by distributing shards across a cluster.
Qdrant is an open-source system written in Rust, optimised for filtered vector search — queries that combine an ANN search with structured metadata filters. It is available as a self-hosted deployment or via Qdrant Cloud. Benchmarks from 2025 show Qdrant achieving 20ms p95 latency at 15,000 queries per second for billion-vector datasets.
pgvector is a PostgreSQL extension that adds vector storage and ANN search to an existing relational database. It allows teams already using PostgreSQL to add vector search without introducing a separate data store, at the cost of some performance relative to purpose-built systems.
Chroma is an open-source, lightweight vector database commonly used for local development and small-scale RAG prototypes. Its simple Python-first API makes it popular in research and educational settings.
Milvus is an open-source vector database developed by Zilliz, designed for large-scale enterprise deployments with heterogeneous index types and tiered storage.
Hybrid Search
Pure vector search excels at semantic similarity but may miss documents containing precise technical terms, named entities, or numeric identifiers that appear verbatim in the query. Hybrid search combines vector similarity scores with traditional keyword (BM25) scores, reranking the merged result set to produce retrieval that is both semantically aware and precise. Weaviate, Qdrant, and Elasticsearch all offer native hybrid search capabilities. Hybrid retrieval has become the default architecture for production RAG pipelines.
See Also
References
References
- DataCamp. (2025). The Top 5 Vector Databases. DataCamp Blog. https://www.datacamp.com/blog/the-top-5-vector-databases
- Malkov, Y. A., & Yashunin, D. A. (2018). Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4).
- InfoQ. (2025). Pinecone Introduces Dedicated Read Nodes in Public Preview for Predictable Vector Workloads. https://www.infoq.com/news/2025/12/pinecone-drn-vector-workloads/
- Johnson, J., Douze, M., & Jégou, H. (2019). Billion-Scale Similarity Search with GPUs. IEEE Transactions on Big Data, 7(3).