What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Cosine Similarity

Cosine similarity is a measure of similarity between two non-zero vectors equal to the cosine of the angle between them, widely used to compare embeddings in search and machine learning.

4 min readLast updated June 2026Foundations

Cosine similarity measures how alike two vectors are by computing the cosine of the angle between them. It is defined as the dot product of the vectors divided by the product of their magnitudes, written cos(theta) = (A · B) / (||A|| ||B||). The result lies between -1 and 1, where 1 means the vectors point in exactly the same direction, 0 means they are orthogonal and therefore unrelated, and -1 means they point in opposite directions. For vectors whose components are all non-negative, which is common for many representations, the value ranges from 0 to 1.

Why direction rather than distance

The defining characteristic of cosine similarity is that it depends only on the orientation of the vectors, not their length. Two documents, one short and one long, that discuss the same topic in the same proportions will have similar direction even though their raw word counts differ greatly in magnitude. By normalising away magnitude, cosine similarity focuses on the pattern of features rather than their absolute scale, which is usually what matters when comparing meaning or composition.

This contrasts with Euclidean distance, which is sensitive to magnitude and can report two semantically similar but differently scaled vectors as far apart. When vectors are first normalised to unit length, cosine similarity and Euclidean distance become monotonically related, and ranking by one is equivalent to ranking by the other.

Role in embeddings and search

Cosine similarity is the workhorse comparison in modern systems built on embeddings, the dense numerical representations of text, images or other data produced by neural networks. In semantic search, a user query is converted into an embedding and compared against a collection of stored embeddings; the items with the highest cosine similarity are returned as the most relevant. The same principle underpins retrieval-augmented generation, recommendation systems, duplicate detection, and clustering.

Vector databases such as those used to power large-scale retrieval implement approximate nearest-neighbour search optimised for cosine similarity or the closely related inner product, allowing similarity queries over millions or billions of vectors in milliseconds. Because normalised dot product equals cosine similarity, many systems store unit-normalised vectors and compute a simple inner product for efficiency.

A worked intuition

Consider three short texts represented as term-frequency vectors. Two that share many of the same words, regardless of how long each text is, will have a small angle between their vectors and thus a cosine similarity near 1. A text on an unrelated subject will share few terms, giving a near-orthogonal vector and a similarity near 0. This simple geometry generalises directly to high-dimensional learned embeddings, where each of hundreds or thousands of dimensions encodes some latent feature.

| Measure | Sensitive to magnitude | Typical use | | --- | --- | --- | | Cosine similarity | No | Embedding and document comparison | | Dot product | Yes | Scoring with unnormalised vectors | | Euclidean distance | Yes | Geometric nearest-neighbour search |

Malaysian Context — Powering Local Search and Recommendation

Cosine similarity underlies many AI applications being built by Malaysian companies and institutions. E-commerce and ride-hailing platforms operating in the country, including Grab and regional marketplaces, use embedding-based recommendation and search in which cosine similarity ranks products, merchants or content for users. As these firms expand multilingual support for Bahasa Melayu, English and Chinese, embedding comparison helps match queries to relevant items across languages.

Homegrown language models such as ILMU and MaLLaM produce embeddings tuned for Malaysian languages and dialects, and downstream retrieval systems rely on cosine similarity to surface relevant Bahasa Melayu documents. This matters for public-sector knowledge bases and for banks such as Maybank and CIMB building internal semantic search over local policy and regulatory text.

Malaysia's growing data-centre and AI-cloud capacity, including investments by YTL and multinational providers, gives local organisations the infrastructure to run vector databases at scale. Training delivered through MDEC programmes and HRD Corp-funded courses commonly covers embeddings and similarity search, equipping Malaysian developers to build retrieval and recommendation features for regional markets across Southeast Asia.

References

Manning, C. D., Raghavan, P., and Schutze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.
Mikolov, T. et al. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781.
Johnson, J., Douze, M., and Jegou, H. (2019). Billion-Scale Similarity Search with GPUs. IEEE Transactions on Big Data.

Type	Vector similarity measure
Range	-1 to 1 (0 to 1 for non-negative vectors)
Formula	(A · B) / (\|\|A\|\| \|\|B\|\|)
Key property	Invariant to vector magnitude
Main use	Comparing embeddings, retrieval
Related	Dot product, Euclidean distance