What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Question Answering

Question answering is the natural language processing task of producing accurate answers to questions posed in natural language, often using information retrieval, reading comprehension, or large language models.

5 min readLast updated May 2026Applications

Question answering (QA) is the natural language processing task of producing an accurate answer to a question expressed in natural language. It spans a range of formulations from short-span answer extraction over a single paragraph to multi-hop reasoning across many documents, structured query answering over databases, and conversational answering grounded in private corpora.

Subtypes

QA is commonly classified along two dimensions. The first is source format: textual QA reads passages or documents, knowledge-base QA queries structured triples, table QA reasons over tabular data, and visual QA reads images alongside text. The second is answer form: extractive QA returns a span from the source, multiple-choice QA selects an option, abstractive QA generates a free-form answer, and yes/no QA returns a binary judgement.

Open-domain QA answers questions over a large external corpus and typically combines retrieval with reading comprehension. Closed-book QA forces the model to answer from parametric knowledge alone, with no retrieval at inference time. Conversational QA maintains dialogue state and resolves references to earlier turns.

Reading comprehension and SQuAD

Extractive reading comprehension was popularised by the Stanford Question Answering Dataset (SQuAD), introduced in 2016. SQuAD pairs Wikipedia paragraphs with crowdsourced questions and answer spans. Subsequent work scaled to SQuAD 2.0 with unanswerable questions, Natural Questions with real Google search queries, TriviaQA, HotpotQA for multi-hop reasoning, and DROP for discrete arithmetic reasoning. Transformer encoders such as BERT, RoBERTa, and ALBERT achieved human-level performance on SQuAD by 2019 by fine-tuning on the task-specific format.

Open-domain and retrieval-augmented QA

Open-domain QA decomposes into a retriever and a reader. Dense Passage Retrieval and downstream variants encode questions and passages into a shared vector space, with nearest-neighbour search returning relevant passages. The retrieved context is then read by a generative model. Retrieval-augmented generation (RAG), introduced by Facebook AI Research in 2020, unified retrieval and generation in a differentiable pipeline. Modern production QA almost universally uses RAG-style architectures with vector databases such as Pinecone, Weaviate, Qdrant, or Chroma.

Large language models as QA systems

Large language models perform QA in zero-shot or few-shot settings without task-specific fine-tuning, drawing on parametric knowledge acquired during pretraining. Chain-of-thought prompting improves reasoning-heavy QA by eliciting intermediate steps. Tool use and function calling extend QA to live data sources, calculators, and code execution. Hybrid systems pair an LLM with retrieval, structured knowledge graphs, or specialised tools to balance recall, factuality, and freshness.

Evaluation

Extractive QA is typically scored with exact-match and token-level F1 against reference spans. Multiple-choice QA uses accuracy. Generative QA requires more nuanced evaluation: ROUGE and BLEU capture surface similarity, while learned metrics, natural language inference for entailment, and human ratings assess faithfulness and helpfulness. Benchmark suites such as MMLU, BIG-Bench, and GPQA probe broader knowledge and reasoning, while domain-specific benchmarks such as MedQA, BioASQ, and LegalBench evaluate professional QA.

Common challenges

Hallucination, retrieval failure, multi-hop reasoning, temporal reasoning, ambiguity resolution, and adversarial robustness are persistent challenges. Faithfulness — answers being supported by retrieved evidence — is a central design objective for enterprise systems. Long-context, multilingual, and low-resource QA remain active research areas.

Applications

QA underpins consumer search experiences such as Google's AI Overviews and Bing's chat search, enterprise knowledge assistants over internal documentation, customer support bots, medical decision support, legal research tools, e-discovery, education platforms, and government service portals.

Malaysian Context — QA Deployments in Public and Private Sectors

Question answering systems are widely deployed across Malaysian government and industry. The Malaysia Digital Economy Corporation (MDEC), the Inland Revenue Board (LHDN), the Companies Commission of Malaysia (SSM), and the Department of Personal Data Protection have introduced QA chatbots for citizen and business enquiries. The Ministry of Digital's MyGOV Service Hub uses retrieval-augmented QA over official policy documents. State health departments under the Ministry of Health are piloting QA assistants on clinical guidelines, in compliance with the Personal Data Protection Act 2010.

In banking, Maybank's MAE assistant, CIMB's Octo, RHB's Buddy, and Public Bank's chatbot handle account and product enquiries through QA pipelines. Bank Negara Malaysia's 2024 guidance on AI in financial services requires Malaysian licensees to log retrieved sources for each QA response and to validate faithfulness against authoritative documents. The Securities Commission Malaysia issued similar guidance covering robo-advisory QA for capital market intermediaries.

Local QA systems must handle Bahasa Melayu, English, Mandarin, Tamil, and code-mixed input. Research groups at Universiti Sains Malaysia, Universiti Malaya, and MIMOS contribute to multilingual QA datasets. SEA-LION, the Southeast Asian language model initiative led by AI Singapore with Malaysian participation, is a frequent baseline for regional QA evaluation. Local startups including Hoot, Naluri, and Pickatale use QA in customer experience and education products.

References

Rajpurkar, P. et al. (2016). SQuAD: 100,000+ Questions for Machine Comprehension of Text. EMNLP.
Karpukhin, V. et al. (2020). Dense Passage Retrieval for Open-Domain Question Answering. EMNLP.
Lewis, P. et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS.
Bank Negara Malaysia. (2024). Discussion Paper on Use of Artificial Intelligence in Financial Services. bnm.gov.my.

Tags:QA NLP reading comprehension retrieval knowledge base

Type	NLP task
Subtypes	Extractive, abstractive, open-domain, closed-book
Key datasets	SQuAD, Natural Questions, TriviaQA, HotpotQA
Key metrics	Exact match, F1, accuracy
Modern systems	RAG, LLMs, knowledge-grounded models