Cosine Similarity
Cosine similarity is a measure of similarity between two non-zero vectors equal to the cosine of the angle between them, widely used to compare embeddings in search and machine learning.
Cosine similarity measures how alike two vectors are by computing the cosine of the angle between them. It is defined as the dot product of the vectors divided by the product of their magnitudes, written cos(theta) = (A · B) / (||A|| ||B||). The result lies between -1 and 1, where 1 means the vectors point in exactly the same direction, 0 means they are orthogonal and therefore unrelated, and -1 means they point in opposite directions. For vectors whose components are all non-negative, which is common for many representations, the value ranges from 0 to 1.
Why direction rather than distance
The defining characteristic of cosine similarity is that it depends only on the orientation of the vectors, not their length. Two documents, one short and one long, that discuss the same topic in the same proportions will have similar direction even though their raw word counts differ greatly in magnitude. By normalising away magnitude, cosine similarity focuses on the pattern of features rather than their absolute scale, which is usually what matters when comparing meaning or composition.
This contrasts with Euclidean distance, which is sensitive to magnitude and can report two semantically similar but differently scaled vectors as far apart. When vectors are first normalised to unit length, cosine similarity and Euclidean distance become monotonically related, and ranking by one is equivalent to ranking by the other.
Role in embeddings and search
Cosine similarity is the workhorse comparison in modern systems built on embeddings, the dense numerical representations of text, images or other data produced by neural networks. In semantic search, a user query is converted into an embedding and compared against a collection of stored embeddings; the items with the highest cosine similarity are returned as the most relevant. The same principle underpins retrieval-augmented generation, recommendation systems, duplicate detection, and clustering.
Vector databases such as those used to power large-scale retrieval implement approximate nearest-neighbour search optimised for cosine similarity or the closely related inner product, allowing similarity queries over millions or billions of vectors in milliseconds. Because normalised dot product equals cosine similarity, many systems store unit-normalised vectors and compute a simple inner product for efficiency.
A worked intuition
Consider three short texts represented as term-frequency vectors. Two that share many of the same words, regardless of how long each text is, will have a small angle between their vectors and thus a cosine similarity near 1. A text on an unrelated subject will share few terms, giving a near-orthogonal vector and a similarity near 0. This simple geometry generalises directly to high-dimensional learned embeddings, where each of hundreds or thousands of dimensions encodes some latent feature.
| Measure | Sensitive to magnitude | Typical use | | --- | --- | --- | | Cosine similarity | No | Embedding and document comparison | | Dot product | Yes | Scoring with unnormalised vectors | | Euclidean distance | Yes | Geometric nearest-neighbour search |
References
- Manning, C. D., Raghavan, P., and Schutze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.
- Mikolov, T. et al. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781.
- Johnson, J., Douze, M., and Jegou, H. (2019). Billion-Scale Similarity Search with GPUs. IEEE Transactions on Big Data.