What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Emergent Abilities

Capabilities that appear in large language models only once they reach a certain scale, and which are not present in smaller models, along with the debate over whether such abilities are real or measurement artifacts.

4 min readLast updated July 2026Foundations

Emergent abilities are capabilities that large language models exhibit at large scale but that are not present, or are near random, in smaller models of the same family. The term was popularised in 2022 to describe the observation that certain skills, such as multi-step arithmetic, answering questions in specialised domains, or following instructions, seemed to appear suddenly once a model exceeded a threshold in size or training compute, rather than improving smoothly as the model grew. The concept became influential in discussions of how model capabilities scale and in debates about how predictable and how safe increasingly large models are.

The original observation

Researchers plotting model performance against scale reported that on some tasks a model's accuracy remained close to chance across a wide range of sizes and then rose sharply beyond a certain point. Because the improvement looked like a phase transition rather than gradual progress, these abilities were called emergent, borrowing a term from complex systems where qualitatively new behaviour appears at scale. If genuine, emergence has important implications: it would mean that simply making models larger could unlock unforeseen capabilities that were not visible in smaller versions, making capabilities hard to predict in advance.

The mirage critique

The claim was challenged in an influential 2023 paper, "Are Emergent Abilities of Large Language Models a Mirage?", by Rylan Schaeffer, Brando Miranda, and Sanmi Koyejo, which received a NeurIPS outstanding paper award. The authors argued that apparent emergence can be an artifact of the evaluation metric rather than a real change in model behaviour. Many benchmarks use harsh all-or-nothing scoring, for example marking a long arithmetic answer correct only if every digit is right. Under such a discontinuous metric, a model whose underlying competence is improving smoothly can appear to jump from zero to success once it crosses a hidden threshold. When the same tasks are scored with continuous or more forgiving metrics, the paper showed, the sharp jumps often smooth out into gradual, predictable improvement.

This critique reframed part of the debate as one about measurement. It did not claim that large models lack impressive capabilities, only that the specific appearance of sudden, unpredictable emergence can often be traced to how performance is measured.

Current understanding

The discussion has since become more nuanced. Analysts distinguish between the strong claim that capabilities appear unpredictably at scale and the weaker, well-supported observation that larger models are simply more capable across many tasks. Some researchers note that even if a metric is the proximate cause of a discontinuity, discontinuous metrics can still be the ones that matter in practice, since many real applications care about getting an answer entirely right rather than partially right. Work has also explored predicting future capabilities by fine-tuning or by extrapolating trends, aiming to make capability forecasting more reliable. Emergent abilities remain a touchstone in conversations about neural scaling laws, evaluation design, and AI safety, where the predictability of capability gains bears directly on how carefully new models should be tested before release.

Malaysian Context — Capability Forecasting and AI Governance

The debate over emergent abilities matters to Malaysia's approach to AI governance because it concerns how predictable and controllable advanced models are. The National AI Office established under the MyDigital agenda, together with the Malaysia AI Governance Framework, is tasked with promoting responsible AI, and the question of whether large models can gain unforeseen capabilities informs how rigorously such models should be evaluated before deployment in sensitive public services.

For Malaysian institutions building or adopting local large language models, including MaLLaM, ILMU, and the regional SEA-LION family, the practical lesson from the mirage critique is that capability should be measured with care. Benchmarks designed for English and Western contexts may understate or misrepresent a model's competence in Bahasa Malaysia, in code-switched Malaysian usage, or on locally relevant tasks, so evaluation choices strongly shape reported performance. Research groups at Universiti Malaya, Universiti Sains Malaysia, and Universiti Kebangsaan Malaysia, alongside government research through MIMOS, contribute to developing evaluation practices suited to the Malaysian language environment.

The topic also connects to cybersecurity and safety oversight coordinated by the National Cyber Security Agency (NACSA), since unpredictable capability gains are among the risks that AI-safety and red-teaming efforts seek to anticipate. As Malaysian banks, telcos, and public agencies deploy ever larger models, sound capability forecasting supports the balance between innovation and public trust that national policy seeks.

References

Wei, J., et al. (2022). Emergent Abilities of Large Language Models. Transactions on Machine Learning Research.
Schaeffer, R., Miranda, B., and Koyejo, S. (2023). Are Emergent Abilities of Large Language Models a Mirage?. NeurIPS.
Center for Security and Emerging Technology. (2023). Emergent Abilities in Large Language Models: An Explainer. cset.georgetown.edu.

Tags:large language models scaling capabilities evaluation

Type	Scaling phenomenon in LLMs
Popularised	2022 (Wei et al.)
Definition	Abilities absent in small models, present in large ones
Key debate	Real emergence vs. metric artifact
Related	Neural scaling laws, In-context learning