What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Retrieval-Augmented Generation

A technique that enhances large language model outputs by retrieving relevant documents from an external knowledge base at inference time, grounding responses in up-to-date and domain-specific information.

6 min readLast updated May 2026Applications

Retrieval-Augmented Generation (RAG) is an AI framework that combines the parametric knowledge stored in a large language model (LLM) with non-parametric information retrieved from an external corpus at query time. Rather than relying solely on patterns learned during pre-training, a RAG system dynamically fetches relevant documents or passages and includes them in the prompt context provided to the model, allowing it to generate responses grounded in specific, current, or proprietary data.[^1]

Background

Large language models acquire knowledge by training on vast corpora, but this knowledge is frozen at the training cutoff date. Models may also hallucinate — producing plausible-sounding but factually incorrect statements — particularly when asked about niche topics, recent events, or proprietary information not present in the training data. RAG addresses these limitations without the expense and complexity of retraining or fine-tuning the model.

The foundational paper by Lewis et al. at Meta AI (then Facebook AI Research), published in 2020, demonstrated that augmenting a pre-trained language model with a dense retrieval component improved performance on open-domain question answering tasks substantially.[^2]

How RAG Works

A RAG pipeline operates in four broad stages.

Ingestion involves processing a document corpus into retrievable chunks. Documents are split into segments (often a few hundred tokens each), and each segment is converted into a dense vector representation — an embedding — using a sentence encoder or another embedding model. These vectors are stored in a vector database alongside the original text.

Retrieval occurs when a user submits a query. The query is encoded into the same vector space as the stored documents. The vector database performs an approximate nearest-neighbour search, returning the top-k document chunks whose embeddings are most similar to the query embedding. This similarity search operates in milliseconds even across millions of stored vectors.

Augmentation combines the retrieved document chunks with the original user query into a structured prompt. This prompt is passed to the LLM, providing it with relevant context. The amount of retrieved content is limited by the model's context window size, though this constraint has relaxed significantly as models now support contexts of 128,000 tokens and beyond.

Generation is the final step in which the LLM reads the augmented prompt and produces a response, drawing on both the retrieved context and its trained knowledge. Citations can be extracted by identifying which retrieved passages influenced the output.

Retrieval Strategies

Several retrieval approaches exist, each with different trade-offs:

| Strategy | Description | Best for | |----------|-------------|----------| | Dense retrieval | Nearest-neighbour search in embedding space | Semantic similarity, paraphrase matching | | Sparse retrieval | BM25 keyword-based ranking | Exact term matching, named entities | | Hybrid retrieval | Combining dense and sparse scores | Balanced precision and recall | | Reranking | Cross-encoder scoring of top-k candidates | High-stakes accuracy, small latency budget |

Hybrid retrieval, which combines vector similarity with keyword search, has become the default in production RAG systems because it handles both semantically phrased queries and specific technical terms reliably.

Advanced RAG Patterns

Beyond the basic pipeline, several design patterns have emerged for more demanding applications. Corrective RAG (CRAG) evaluates the quality of retrieved documents and falls back to web search if the local retrieval is insufficient. Self-RAG introduces a reflection step where the model decides whether to retrieve, judges the relevance of retrieved documents, and critiques its own generated output. GraphRAG, developed by Microsoft Research, constructs a knowledge graph over the document corpus rather than raw vector embeddings, enabling multi-hop reasoning over structured relationships.[^3]

Comparison with Fine-Tuning

RAG and fine-tuning are complementary rather than competing approaches to specialising an LLM for a domain.

Fine-tuning updates the model's weights to encode domain knowledge, improving the model's general behaviour in that domain but requiring periodic retraining as knowledge evolves. It is effective for learning communication styles, domain terminology, and consistent output formats.

RAG keeps the base model unchanged and provides knowledge at inference time, making it easier to update the knowledge base without touching the model. It is more suitable when the underlying information changes frequently, when strict source attribution is required, or when the knowledge base is too large to encode into model weights.

In practice, many production systems employ both: a fine-tuned model paired with a RAG retrieval layer.

Infrastructure Requirements

A production RAG system requires several components: an embedding model to vectorise documents and queries, a vector database (such as Pinecone, Weaviate, Qdrant, or pgvector) for storage and retrieval, a chunking strategy to segment documents appropriately, and an LLM capable of synthesising retrieved context. Orchestration frameworks such as LangChain and LlamaIndex provide pre-built abstractions for wiring these components together.

Malaysian Context — RAG in Enterprise and Government Applications

Retrieval-augmented generation has become the dominant architecture for enterprise AI deployments in Malaysia, particularly in sectors where information must be current, auditable, and grounded in institutional documents such as regulations, policies, and product catalogues.

In the banking sector, Maybank and CIMB have implemented RAG-based systems that allow customer service agents and internal staff to query large internal document repositories — including product documentation, compliance guidelines, and Bank Negara Malaysia (BNM) circulars — using natural language. BNM's ongoing AI governance consultations emphasise the importance of explainability and source attribution in financial AI systems, requirements that RAG architectures are well-suited to satisfy by surfacing cited document passages alongside answers.

Malaysian government agencies have explored RAG for digitising citizen services. Agencies under the MyDigital Blueprint initiative, including the Malaysian Administrative Modernisation and Management Planning Unit (MAMPU), have piloted internal knowledge bases that allow civil servants to query policy documents and standard operating procedures through RAG-powered interfaces, reducing reliance on manual document search.

In healthcare, Hospital Universiti Kebangsaan Malaysia (HUKM) and private hospital groups such as Sunway Medical Centre have investigated RAG for clinical decision support, enabling clinicians to query medical literature and internal treatment protocols with grounded, cited responses. The Personal Data Protection Act (PDPA) is a central consideration in these deployments, as patient data must not enter the retrieval corpus without appropriate anonymisation or consent.

Technology companies such as Telekom Malaysia (TM) and Maxis have deployed RAG for internal IT helpdesk automation, ingesting technical documentation and network configuration guides into vector databases. Malaysia's AI talent pool, supported by HRD Corp-funded training programmes and Universiti Malaya's postgraduate AI curriculum, has produced a growing cohort of engineers capable of building and maintaining RAG pipelines using frameworks such as LangChain and LlamaIndex.

References

IBM. (2024). What is Retrieval-Augmented Generation (RAG)? IBM Think. https://www.ibm.com/think/topics/retrieval-augmented-generation
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, 33.
Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., Truitt, S., & Larson, J. (2024). From Local to Global: A Graph RAG Approach to Query-Focused Summarization. Microsoft Research.
Pinecone. (2025). Retrieval-Augmented Generation: A Technical Overview. https://www.pinecone.io/learn/retrieval-augmented-generation/

Tags:RAG retrieval vector database LLM knowledge base

Abbreviation	RAG
Introduced	2020 (Lewis et al., Meta AI)
Type	Prompting and retrieval technique
Key use	Document Q&A, enterprise search, knowledge management
Related	Vector database, embedding, semantic search, prompt engineering

Background

How RAG Works

Retrieval Strategies

Advanced RAG Patterns

Comparison with Fine-Tuning

Infrastructure Requirements

See Also

References

References