Large Language Models

Large language models (LLMs) are AI systems trained on vast corpora of text to predict and generate natural language. They underpin modern chatbots, code assistants, and generative AI applications.

5 min readLast updated May 2026Foundations

A large language model (LLM) is a type of neural network trained on massive text corpora to predict and generate coherent natural language. Built on the Transformer architecture introduced by Vaswani et al. in 2017, modern LLMs are pre-trained using self-supervised objectives — typically predicting the next token in a sequence — and then fine-tuned for specific behaviours using techniques including Reinforcement Learning from Human Feedback (RLHF).[^1]

LLMs represent the current frontier of natural language processing. The release of ChatGPT in November 2022 brought LLM capabilities to mainstream public awareness; the technology has since reshaped enterprise workflows, software development, content creation, education, and research.

How LLMs Work

The Transformer Architecture

The Transformer replaced earlier recurrent architectures (LSTM, GRU) with a self-attention mechanism that relates every token in a sequence to every other token simultaneously. This allows:

Parallel training — processing all positions in a sequence at once (vs. sequentially)
Long-range dependencies — capturing context across thousands of tokens
Scalability — performance reliably improves with more parameters, data, and compute

Key components: multi-head self-attention, positional encodings, layer normalisation, feed-forward networks, residual connections.

Pre-training

LLMs are pre-trained on web-scale text corpora (Common Crawl, books, code, Wikipedia, scientific papers — trillions of tokens). The model learns to compress a vast statistical model of language and factual knowledge.

Fine-tuning and Alignment

Raw pre-trained models produce outputs that follow the training distribution but aren't necessarily helpful or safe. Instruction tuning trains models to follow user instructions. RLHF further shapes outputs using human preference feedback, making models more helpful, harmless, and honest.

Context Window

The context window defines how much text an LLM can "see" at once. Early GPT-3 had 4k tokens; Claude 3 supports up to 200k tokens; some models reach 1M+ tokens. Longer context enables document-level reasoning and multi-document synthesis.

Major Models (2024–2026)

| Model | Developer | Parameters | Context | |-------|-----------|------------|---------| | GPT-4o | OpenAI | ~200B (est.) | 128k | | Claude 3.5 Sonnet | Anthropic | Undisclosed | 200k | | Gemini 1.5 Pro | Google DeepMind | Undisclosed | 1M | | Llama 3.1 405B | Meta | 405B | 128k | | Mistral Large | Mistral AI | 123B | 128k |

Capabilities and Limitations

Strengths:

Text generation, summarisation, translation
Question answering and reasoning over documents
Code generation and debugging
Few-shot learning from examples in context

Limitations:

Hallucinations — generating plausible but factually incorrect statements
Knowledge cutoff — knowledge frozen at pre-training date
Reasoning limits — struggle with multi-step mathematical reasoning
Context faithfulness — can ignore or misinterpret provided context
Sycophancy — tendency to agree with user premises even when wrong

Malaysian Context — LLM Adoption & Bahasa Malaysia

Bahasa Malaysia (BM) support — Major LLMs are predominantly English-centric, though BM support has improved significantly. GPT-4 and Claude handle standard BM adequately; Bahasa Melayu code-switching (common in Malaysian workplaces) remains a challenge. MIMOS Berhad and MDEC have studied BM NLP requirements. Projects like BERTi (a BM BERT variant) and MalaysianLLM (community initiative) address local language needs.

Enterprise adoption — Malaysian corporations adopting LLMs include:

Maybank — deployed an internal LLM assistant for compliance and advisory work
Telekom Malaysia — uses LLMs for customer support and technical documentation
Petronas — exploring LLM-powered knowledge management for HSE and engineering documents
government agencies — JPA and MAMPU trialling LLM for draft policy documents

PDPA compliance — Using LLM APIs (OpenAI, Anthropic, Google) requires careful PDPA data handling. Sending personal data of Malaysian citizens to offshore LLM APIs may require explicit consent and data processing agreements. The JPDP (Commissioner) has not yet issued specific LLM guidance; practitioners follow the PDPA 2010 general principles and await the revised PDPA amendment bill.

Local LLM deployment — Several Malaysian enterprises deploy open-source LLMs (Llama, Mistral) on-premise via AWS Bedrock (Malaysia region) or Azure (Malaysia South) to keep data within national borders. MDEC's AI on Cloud programme subsidises cloud credits for qualifying SMEs.

Cost context — GPT-4o pricing at ~USD 5/M input tokens; Claude Sonnet at ~USD 3/M input tokens. At scale, Malaysian SMEs often migrate from API-based LLMs to fine-tuned smaller models (7B–13B parameter) hosted locally to control cost.

Prompt Engineering

Getting consistent, high-quality output from LLMs requires skill in prompt engineering:

Clear instructions — be explicit about task, format, and constraints
Few-shot examples — provide 2–5 examples of desired input-output pairs
Chain-of-thought — ask the model to reason step by step before answering
Role assignment — "Act as an experienced Malaysian lawyer…"
Output format specification — request JSON, markdown tables, or bullet points

Retrieval-Augmented Generation (RAG)

RAG addresses the knowledge cutoff and hallucination problems by:

Retrieving relevant documents from a vector database at query time
Providing retrieved documents as context to the LLM
Grounding the LLM's answer in actual retrieved evidence

RAG is now the dominant pattern for enterprise LLM deployments over private knowledge bases.

References

Vaswani, A. et al. (2017). "Attention Is All You Need." NeurIPS 2017.
Ouyang, L. et al. (2022). "Training language models to follow instructions with human feedback." NeurIPS 2022.
MDEC (2024). Generative AI Adoption Tracker: Malaysia Enterprise Survey Q1 2024.

Tags:LLM GPT transformer generative AI NLP Claude ChatGPT

Architecture	Transformer (2017)
Training method	Self-supervised + RLHF
Notable models	GPT-4, Claude, Gemini, Llama 3
Applications	Chatbots, code, writing, analysis
Key metric	Parameters (billions)