What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Natural Language Generation

Natural Language Generation (NLG) is a subfield of artificial intelligence that automatically produces human-readable text from structured data, semantic representations, or other machine-readable inputs.

7 min readLast updated June 2026Foundations

Natural Language Generation (NLG) is a subfield of artificial intelligence concerned with automatically producing human-readable text from structured data, semantic representations, or machine-readable inputs. Alongside Natural Language Understanding (NLU), NLG forms the two principal components of Natural Language Processing (NLP). While NLU interprets human language, NLG produces it — enabling machines to communicate information in fluent, contextually appropriate prose.

Historical Background

Early NLG systems emerged in the 1970s from work in computational linguistics and knowledge representation. These rule-based systems, such as MUMBLE (McDonald, 1983) and FUF/SURGE (Elhadad, 1992), relied on hand-crafted grammars and templates to assemble sentences from structured knowledge bases. The outputs were grammatically correct but required extensive manual engineering for each domain.

Statistical NLG emerged in the 2000s, borrowing techniques from machine translation and corpus linguistics. Systems learned surface realisations from aligned data, reducing the need for hand-crafted rules while improving coverage. However, statistical methods struggled with long-range coherence and factual accuracy.

The advent of deep learning, and particularly the Transformer architecture introduced by Vaswani et al. in 2017, marked a turning point. Pre-trained language models such as GPT-2 (2019), T5 (2019), and GPT-3 (2020) demonstrated that large-scale neural models trained on internet-scale text could generate fluent, diverse, and contextually rich text with minimal task-specific fine-tuning.

Core Pipeline

Classical NLG systems decompose generation into a pipeline of discrete stages.

Content determination involves selecting which information from the input to express, filtering irrelevant facts and ranking the rest by importance. Discourse planning orders the selected content into a coherent narrative structure, establishing relationships such as cause-effect, contrast, or elaboration between propositions. Sentence aggregation groups related propositions into single sentences to avoid choppy, list-like outputs. Lexicalisation chooses the specific words and phrases to express each proposition, drawing on domain vocabulary and stylistic constraints. Referring expression generation decides how to refer to entities — by name, pronoun, or definite description — to maintain clarity while avoiding repetition. Surface realisation converts the abstract sentence plan into grammatical text, handling morphology, agreement, punctuation, and word order.

Modern neural NLG models collapse several of these stages into an end-to-end learned function, implicitly performing content determination, discourse planning, and surface realisation within a single forward pass through the network.

Neural Approaches

Contemporary NLG is dominated by large language models (LLMs) based on the Transformer decoder architecture. These models are pre-trained on massive text corpora using a next-token prediction objective, learning statistical regularities of language at scale. Fine-tuning or prompting then adapts the model to specific generation tasks.

Sequence-to-sequence models with encoder-decoder architectures — such as BART and T5 — are widely used for conditional generation tasks where the output depends on a specific input, such as summarisation, translation, or data-to-text generation. The encoder processes the source input and the decoder generates the target text token by token, attending to relevant parts of the encoded representation.

Instruction-tuned models such as GPT-4 and Claude respond to natural language prompts that specify the desired output format, style, and content constraints, making NLG accessible without specialised training pipelines.

Applications

NLG powers a wide range of commercial and research applications.

Automated journalism uses NLG to generate templated news articles from structured data such as financial earnings reports, sports scores, and weather forecasts. Companies such as Automated Insights and Narrative Science operate platforms that produce millions of such articles each week for wire services and corporate clients.

Business intelligence tools use NLG to convert data visualisations and dashboard metrics into executive summaries in plain English, making insights accessible to non-technical stakeholders. Chatbots and virtual assistants rely on NLG to formulate responses that are grammatically natural and tonally appropriate to the conversation context.

Clinical documentation in healthcare leverages NLG to generate discharge summaries, radiology reports, and patient letters from structured electronic health record data, reducing clinician documentation burden. Code generation, exemplified by systems such as GitHub Copilot, treats programming languages as a generation target, producing functional code from natural language specifications.

Evaluation

Evaluating NLG outputs is a persistent research challenge. Automatic metrics such as BLEU, ROUGE, and METEOR measure surface-level overlap between generated and reference text, but correlate imperfectly with human judgements of fluency, coherence, and factual accuracy. Newer metrics such as BERTScore compute semantic similarity in embedding space, partially addressing the limitation of n-gram overlap. Human evaluation remains the gold standard, assessing dimensions such as fluency, adequacy, and overall quality through crowd-sourced or expert annotation.

Factual consistency — ensuring that generated text accurately reflects the source input — has emerged as a critical dimension, particularly in healthcare and legal applications. Hallucination, where models generate plausible-sounding but incorrect content, remains an active area of research.

Challenges

Despite remarkable progress, NLG faces several open challenges. Controlling factual accuracy requires models to ground outputs in verifiable sources rather than learned statistical associations. Maintaining long-document coherence, ensuring that narratives remain internally consistent across hundreds of sentences, is difficult for autoregressive models with fixed context windows. Style control, generating text that matches a specified author voice, reading level, or cultural register, remains imprecise. Multilingual NLG for low-resource languages is constrained by scarcity of training data.

Malaysian Context — NLG Adoption and Bahasa Malaysia

Natural Language Generation has found practical deployment across several sectors in Malaysia, though adoption is concentrated in larger enterprises and technology-forward organisations. The country's linguistic diversity — with Bahasa Malaysia, English, Mandarin, and Tamil all widely used in commerce and public life — creates both opportunity and complexity for NLG systems.

In Malaysian banking, institutions such as Maybank, CIMB, and RHB have explored NLG for generating personalised financial summaries, customer correspondence, and automated advisory content. Bank Negara Malaysia (BNM) has encouraged the use of technology in financial services through its Financial Technology Regulatory Sandbox, creating a permissive environment for testing AI-driven communication tools.

Bernama, Malaysia's national news agency, has examined automated text generation for templated news formats such as commodity price reports and corporate earnings summaries, following the global trend towards machine-assisted journalism. Local technology firms including Fusionex and ThoughtFocus have developed NLG-adjacent solutions for business intelligence reporting.

The challenge of Bahasa Malaysia poses a unique constraint: most large pre-trained language models are trained predominantly on English text, and their NLG capabilities degrade significantly in Malay. Mimos Berhad and researchers at Universiti Malaya and Universiti Putra Malaysia have contributed to Malay NLP resources, but the gap relative to English-language models remains substantial. The Malaysia Digital Economy Corporation (MDEC) has recognised indigenous language AI as a strategic priority under the MyDigital Blueprint, and several university-led research groups are developing Malay-specific language models that could underpin future NLG systems.

In the public sector, the National AI Office (NAIO) under MOSTI has emphasised AI-driven citizen services, an area where NLG can play a significant role in generating plain-language explanations of government decisions, tax assessments, and public health advisories. Ensuring that such outputs are accurate, unbiased, and accessible in multiple national languages remains a key design constraint for any Malaysian public-sector NLG deployment.

References

Gatt, A., & Krahmer, E. (2018). Survey of the State of the Art in Natural Language Generation: Core Tasks, Applications and Evaluation. Journal of Artificial Intelligence Research, 61, 65-170.
Vaswani, A., et al. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems, 30.
Brown, T., et al. (2020). Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems, 33.
Reiter, E., & Dale, R. (2000). Building Natural Language Generation Systems. Cambridge University Press.
MDEC. (2024). MyDigital Blueprint: AI as a Foundation. Malaysia Digital Economy Corporation.

Tags:nlp text-generation language-models ai-foundations

Type	AI/NLP Subfield
Parent field	Natural Language Processing
Key techniques	Neural LMs, template-based, rule-based
Key use	Report automation, chatbots, content generation
Related	NLP, LLM, Text Summarisation, Machine Translation