What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Neural Scaling Laws

Neural scaling laws are empirical relationships describing how the performance of neural networks improves predictably as a function of model size, dataset size, and compute budget, enabling principled resource allocation for AI training.

7 min readLast updated June 2026Foundations

Neural scaling laws are quantitative empirical relationships that describe how the performance of neural networks — typically measured as loss on a held-out evaluation set — improves as a function of three key variables: the number of model parameters, the size of the training dataset, and the total compute budget available for training. First systematically characterised for large language models by researchers at OpenAI in 2020, scaling laws have become a foundational tool for planning AI training runs, allocating resources efficiently, and extrapolating expected performance improvements from smaller experiments to larger ones.

The central finding is that performance follows smooth power-law relationships across many orders of magnitude of scale. If one doubles the number of model parameters while holding other factors constant, performance improves by a predictable, consistent amount. This regularity across an enormous range of scales — from models with millions of parameters to those with hundreds of billions — was initially surprising and has profoundly influenced the strategic direction of AI development.

The Kaplan et al. Scaling Laws (2020)

The landmark scaling law study, published by Jared Kaplan, Sam McCandlish, and colleagues at OpenAI in January 2020, analysed language model perplexity as a function of model size (N, measured in non-embedding parameters), dataset size (D, measured in tokens), and compute (C, measured in floating point operations). Their key findings were:

Performance improves as a power law in each of N, D, and C, with exponents that are relatively consistent across model architectures, tokenisation schemes, and training datasets.

For a fixed compute budget C, there is an optimal allocation between model size and training data. The 2020 paper found that larger models are significantly more sample-efficient than smaller ones, suggesting that, for a given compute budget, one should prefer to train a larger model on fewer tokens rather than a smaller model on more tokens.

Performance is primarily limited by whichever of N, D, or C is smallest. Increasing one resource while holding the others fixed produces diminishing returns beyond a certain scale.

The Chinchilla Scaling Laws (2022)

A subsequent analysis by Jordan Hoffmann, Sebastian Borgeaud, and colleagues at Google DeepMind, published in March 2022 and known as the "Chinchilla paper" after the model it produced, refined and partially revised the 2020 findings. Using a more rigorous experimental design with models trained at compute-optimal frontiers, the Chinchilla analysis found that the 2020 recommendation to prefer larger models was overstated.

The Chinchilla paper found that, for a given compute budget, the optimal allocation is to train a model with roughly 20 tokens of training data for every parameter. This is substantially more data than the ratio implied by the 2020 results. Many existing large language models, including GPT-3 (trained with approximately 300 billion tokens on 175 billion parameters, a ratio of 1.7:1 rather than 20:1), were significantly undertrained relative to their size.

Chinchilla (70 billion parameters, trained on 1.4 trillion tokens) outperformed much larger models including Gopher (280B parameters, 300B tokens) and GPT-3 on a wide range of tasks, validating the revised scaling prescription. The Chinchilla results shifted industry practice significantly: subsequent models including LLaMA, Mistral, and Falcon adopted much longer training runs relative to model size.

Extensions and Nuances

Emergent Capabilities

While scaling laws predict smooth performance improvement on average loss metrics, researchers have observed that certain capabilities appear abruptly at specific scales — so-called emergent abilities that seem absent below a threshold and present above it. Examples include few-shot arithmetic, chain-of-thought reasoning, and calibrated uncertainty. The abruptness of emergence is partly an artefact of coarse evaluation metrics: more sensitive measures often reveal smoother transitions. Nonetheless, the possibility of qualitative capability changes at scale has important implications for AI safety and planning.

Data Quality and Repetition

Scaling laws assume independent and identically distributed training data. Repeating training data significantly degrades the improvement rate, meaning that in practice the supply of high-quality, non-repeated data represents a real constraint. Research on synthetic data generation (using AI to produce additional training material) and data quality filtering has become increasingly important as web-scale corpora approach exhaustion.

Inference Compute

The original scaling laws focus on training compute. More recent work has extended the analysis to inference compute — studying how performance improves when more computation is applied during inference, for example through chain-of-thought reasoning, repeated sampling, or verifier-guided search. This has become especially relevant with reasoning models such as OpenAI o3 and DeepSeek-R1, which spend substantially more inference compute than earlier models.

Implications for AI Development

Scaling laws provide AI laboratories with a rational basis for planning large training runs. By running small-scale experiments and fitting power laws to the results, researchers can extrapolate the expected performance of models that would take months and tens of millions of dollars to train, validating the likely outcome before committing resources.

The predictability of scaling has driven a sustained increase in AI training expenditure: each doubling of compute yields a predictable performance improvement, making continued investment rational as long as performance remains economically valuable.

Malaysian Context — Scaling Laws and Malaysian AI Research Strategy

Understanding neural scaling laws is increasingly important for Malaysian AI researchers and policymakers making decisions about compute investment and model training strategy. The MyDigital Blueprint and the National AI Roadmap both reference AI infrastructure investment, including the development of national computing capacity. Scaling laws provide a principled framework for determining how much compute is needed to achieve target performance levels, which is directly relevant to procurement decisions.

MDEC and MOSTI have begun engaging with scaling law research to inform Malaysia's AI compute strategy. Rather than attempting to match the frontier training runs of laboratories such as OpenAI or Google — which require many thousands of GPUs and billions of dollars — Malaysia's strategy focuses on efficiently training mid-size models (3B-30B parameters) on locally relevant datasets in Bahasa Malaysia and regional languages, where the Chinchilla prescriptions apply and where compute efficiency is critical.

Universiti Malaya's AI research centre and Universiti Sains Malaysia's School of Computer Sciences have incorporated scaling law concepts into postgraduate AI curricula, recognising that understanding the relationship between compute, data, and performance is essential for responsible AI research planning.

Malaysian cloud providers and data centre operators — including facilities in Cyberjaya operated by NTT, STT GDC, and TIME dotCom — are positioning expanded GPU capacity partly on the basis of scaling law projections that sustained AI training demand will grow with model scale. This creates a feedback loop between academic scaling research and Malaysian data centre investment.

For Malaysian AI startups pursuing foundation model development, scaling laws provide a strategic guide: investing in high-quality Bahasa Malaysia and Southeast Asian language data — particularly scarce relative to English corpora — may yield scaling benefits disproportionate to data volume, since data quality significantly shifts the scaling curve.

References

Kaplan, J., et al. (2020). Scaling Laws for Neural Language Models. arXiv:2001.08361. OpenAI.
Hoffmann, J., et al. (2022). Training Compute-Optimal Large Language Models. arXiv:2203.15556. Google DeepMind.
Wei, J., et al. (2022). Emergent Abilities of Large Language Models. Transactions on Machine Learning Research.
Muennighoff, N., et al. (2023). Scaling Data-Constrained Language Models. NeurIPS 2023.
Anthropic. (2023). Scaling: The State of Play in AI. anthropic.com/research.

Tags:scaling laws neural networks compute Chinchilla Kaplan large language models

Type	Empirical relationships in deep learning
Key papers	Kaplan et al. 2020 (OpenAI); Hoffmann et al. 2022 (Chinchilla)
Variables	Model parameters (N), training tokens (D), compute (C)
Key insight	Performance follows power laws across many orders of magnitude
Related	Large language models, deep learning, GPU cluster, transformer architecture