What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Reasoning Models

Reasoning models are large language models trained to generate extended internal deliberation before producing a final answer, using test-time compute to improve accuracy on complex tasks such as mathematics, coding, and multi-step logic.

6 min readLast updated June 2026Models

Reasoning models are a class of large language models designed to improve performance on complex tasks by spending additional computation at inference time to deliberate before producing an answer. Rather than generating a response immediately from the prompt, a reasoning model first produces an extended sequence of intermediate thoughts — exploring approaches, checking assumptions, and self-correcting — before emitting its final output. This process is sometimes called a chain-of-thought or internal scratchpad, though in most commercial reasoning models it is generated automatically through reinforcement learning rather than by explicit instruction.

The Test-Time Compute Paradigm

The dominant scaling strategy for AI models through 2023 was training-time scaling: building larger models, using more training data, and running longer training runs. This approach, described by the empirical Chinchilla scaling laws, produced substantial capability gains but reached practical limits in terms of cost and data availability.

Reasoning models represent a complementary scaling axis: test-time compute scaling. The insight is that spending more computation during inference — generating thousands of tokens of deliberation rather than a short direct response — can substantially improve performance on tasks that benefit from iterative refinement. A model that is allowed to "think" for longer on a hard mathematics problem produces more accurate results than the same model answering directly, even without any change to its weights.

This observation was not new — prompt-based chain-of-thought techniques had demonstrated similar effects since 2022. What changed with reasoning models was making the deliberation process trainable rather than prompt-dependent, and scaling it to a degree that produced significant benchmark improvements.

Training Methodology

Reasoning models are typically trained using reinforcement learning (RL) with outcome-based rewards. The model receives reward when its final answer is correct (as judged by a verifier) and no reward otherwise. The intermediate reasoning trace is not directly supervised; instead, the RL process discovers reasoning strategies that tend to produce correct final answers.

DeepSeek-R1, released in January 2025, demonstrated that a model could learn extended multi-step reasoning through pure RL with outcome rewards, without any supervised fine-tuning on reasoning traces. The resulting model matched the performance of OpenAI o1 on several benchmarks at approximately 70% lower inference cost, and DeepSeek released both the model weights and a detailed technical report, catalysing extensive follow-on research.

OpenAI's o1 and o3 models use a similar paradigm but with additional techniques including process reward models (PRMs) that score the quality of intermediate reasoning steps rather than only the final answer. o3 achieved 75.7% accuracy on the ARC-AGI benchmark, a test of abstract reasoning that had been considered near-human-level difficulty.

Characteristics and Trade-offs

Reasoning models generate substantially more tokens than standard language models. A standard model might answer a mathematics question in 50 tokens; a reasoning model for the same question might generate 2,000 tokens of deliberation before the final answer. This means that inference costs and latency are significantly higher per query, and analysts project that reasoning workloads will account for 75% of total AI inference compute by 2030.

The extended deliberation also means that reasoning models are better calibrated on tasks with verifiable answers — mathematics, formal logic, code correctness — than on open-ended tasks where there is no ground truth. They can still hallucinate, and longer deliberation does not guarantee correctness; some research has documented cases where additional thinking degrades accuracy on simpler tasks.

Reasoning models are also less efficient for tasks that do not benefit from deliberation, such as simple retrieval, format conversion, or conversational responses. Many AI providers therefore offer a tiered product: a fast standard model for everyday tasks and a reasoning model for tasks requiring deep analysis.

Notable Models

| Model | Provider | Released | Key benchmark | |---|---|---|---| | o1 | OpenAI | September 2024 | 83.3% on AIME 2024 | | o3 | OpenAI | December 2024 | 75.7% on ARC-AGI | | DeepSeek-R1 | DeepSeek | January 2025 | Comparable to o1 | | Gemini 2.0 Flash Thinking | Google | January 2025 | Competitive on MATH | | Claude 3.7 Sonnet | Anthropic | February 2025 | Strong on SWE-bench |

Malaysian Context — Reasoning Models in Enterprise and Education

Reasoning models are gaining adoption in Malaysia across sectors where analytical rigour is valued. In Malaysian banking and finance, institutions such as Maybank, CIMB, and RHB are evaluating reasoning models for applications in risk assessment, regulatory compliance analysis, and financial modelling, where step-by-step justification is as important as the final answer for audit purposes.

The Malaysian legal and professional services sector is an early adopter. Law firms in Kuala Lumpur have piloted reasoning models for contract review and statutory interpretation tasks, where the extended deliberation trace provides a degree of interpretability that standard LLM outputs lack. This aligns with expectations set by the Malaysia AI Governance Framework, which emphasises accountability and transparency in high-stakes AI applications.

MDEC and the National AI Office Malaysia have identified reasoning models as strategically relevant to the national AI agenda because they extend AI capability into domains — advanced engineering, scientific research, financial analysis — that are priorities for Malaysia's ambition to move up the value chain. The University of Malaya, Universiti Teknologi Malaysia, and private institutions in the MDEC Premier Digital Tech Institution programme are incorporating reasoning model concepts into AI research curricula.

For Malaysian AI startups building products on foundation model APIs, reasoning models offer a path to applications that were previously not tractable with standard models. The higher per-query cost is manageable for low-volume, high-value professional use cases, which is a common commercial profile for B2B AI tools in the Malaysian market. Amazon Bedrock, Azure AI, and Google Vertex AI — all present in Malaysia's data centre landscape — include reasoning model access in their managed offerings.

References

OpenAI. (2024). Learning to reason with LLMs. OpenAI Research Blog. https://openai.com/index/learning-to-reason-with-llms/
DeepSeek-AI. (2025). DeepSeek-R1: Incentivising reasoning capability in LLMs via reinforcement learning. arXiv:2501.12948.
Wei, J. et al. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35.
Zylos Research. (2026). AI reasoning models 2026: From OpenAI o3 to DeepSeek-R1 and the test-time compute revolution. Zylos.ai.
Snell, C. et al. (2024). Scaling LLM test-time compute optimally can be more effective than scaling model parameters. arXiv:2408.03314.

Tags:reasoning-models chain-of-thought test-time-compute o1 deepseek-r1

Type	LLM training and inference paradigm
Key technique	Test-time compute scaling via reinforcement learning
Notable examples	OpenAI o1, o3; DeepSeek-R1; Gemini 2.0 Flash Thinking
Introduced commercially	September 2024 (OpenAI o1)
Related	Chain-of-thought prompting, RLHF, Inference, Hallucination