Large Language Models
Large language models (LLMs) are AI systems trained on vast corpora of text to predict and generate natural language. They underpin modern chatbots, code assistants, and generative AI applications.
A large language model (LLM) is a type of neural network trained on massive text corpora to predict and generate coherent natural language. Built on the Transformer architecture introduced by Vaswani et al. in 2017, modern LLMs are pre-trained using self-supervised objectives — typically predicting the next token in a sequence — and then fine-tuned for specific behaviours using techniques including Reinforcement Learning from Human Feedback (RLHF).[^1]
LLMs represent the current frontier of natural language processing. The release of ChatGPT in November 2022 brought LLM capabilities to mainstream public awareness; the technology has since reshaped enterprise workflows, software development, content creation, education, and research.
How LLMs Work
The Transformer Architecture
The Transformer replaced earlier recurrent architectures (LSTM, GRU) with a self-attention mechanism that relates every token in a sequence to every other token simultaneously. This allows:
- Parallel training — processing all positions in a sequence at once (vs. sequentially)
- Long-range dependencies — capturing context across thousands of tokens
- Scalability — performance reliably improves with more parameters, data, and compute
Key components: multi-head self-attention, positional encodings, layer normalisation, feed-forward networks, residual connections.
Pre-training
LLMs are pre-trained on web-scale text corpora (Common Crawl, books, code, Wikipedia, scientific papers — trillions of tokens). The model learns to compress a vast statistical model of language and factual knowledge.
Fine-tuning and Alignment
Raw pre-trained models produce outputs that follow the training distribution but aren't necessarily helpful or safe. Instruction tuning trains models to follow user instructions. RLHF further shapes outputs using human preference feedback, making models more helpful, harmless, and honest.
Context Window
The context window defines how much text an LLM can "see" at once. Early GPT-3 had 4k tokens; Claude 3 supports up to 200k tokens; some models reach 1M+ tokens. Longer context enables document-level reasoning and multi-document synthesis.
Major Models (2024–2026)
| Model | Developer | Parameters | Context | |-------|-----------|------------|---------| | GPT-4o | OpenAI | ~200B (est.) | 128k | | Claude 3.5 Sonnet | Anthropic | Undisclosed | 200k | | Gemini 1.5 Pro | Google DeepMind | Undisclosed | 1M | | Llama 3.1 405B | Meta | 405B | 128k | | Mistral Large | Mistral AI | 123B | 128k |
Capabilities and Limitations
Strengths:
- Text generation, summarisation, translation
- Question answering and reasoning over documents
- Code generation and debugging
- Few-shot learning from examples in context
Limitations:
- Hallucinations — generating plausible but factually incorrect statements
- Knowledge cutoff — knowledge frozen at pre-training date
- Reasoning limits — struggle with multi-step mathematical reasoning
- Context faithfulness — can ignore or misinterpret provided context
- Sycophancy — tendency to agree with user premises even when wrong
Prompt Engineering
Getting consistent, high-quality output from LLMs requires skill in prompt engineering:
- Clear instructions — be explicit about task, format, and constraints
- Few-shot examples — provide 2–5 examples of desired input-output pairs
- Chain-of-thought — ask the model to reason step by step before answering
- Role assignment — "Act as an experienced Malaysian lawyer…"
- Output format specification — request JSON, markdown tables, or bullet points
Retrieval-Augmented Generation (RAG)
RAG addresses the knowledge cutoff and hallucination problems by:
- Retrieving relevant documents from a vector database at query time
- Providing retrieved documents as context to the LLM
- Grounding the LLM's answer in actual retrieved evidence
RAG is now the dominant pattern for enterprise LLM deployments over private knowledge bases.
References
- Vaswani, A. et al. (2017). "Attention Is All You Need." NeurIPS 2017.
- Ouyang, L. et al. (2022). "Training language models to follow instructions with human feedback." NeurIPS 2022.
- MDEC (2024). Generative AI Adoption Tracker: Malaysia Enterprise Survey Q1 2024.