What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Test-Time Compute

Test-time compute refers to the computational effort a model expends during inference rather than training, a paradigm in which large language models improve reasoning by generating and evaluating more intermediate steps before answering.

4 min readLast updated June 2026Foundations

Overview

Test-time compute, also called inference-time scaling, describes the amount of computation a model uses when generating an answer, as opposed to the computation used to train it. The concept rose to prominence in 2024 and 2025 with the emergence of reasoning models that deliberately think for longer before responding. Rather than relying solely on larger models trained on more data, this approach improves performance by allocating additional compute at the moment a question is asked.

The central insight is that for difficult problems, allowing a model to generate, explore and evaluate many intermediate reasoning steps can yield accuracy that simply scaling up model size and training does not. Empirical results suggest that a smaller model given substantially more inference compute can rival a much larger model using standard inference.

Approaches

Test-time scaling techniques fall into several broad categories.

Sequential scaling

The model produces an extended chain of thought, working through a problem step by step and sometimes revising earlier steps. Reasoning models such as OpenAI's o1 and o3 series, DeepSeek-R1 and reasoning-tuned versions of Gemini are trained to generate long internal reasoning traces before committing to a final answer.

Parallel scaling

The model generates many independent candidate answers and selects among them. Self-consistency samples multiple reasoning paths and takes a majority vote, while best-of-n sampling uses a verifier or reward model to pick the strongest candidate.

Search-based scaling

Techniques borrowed from classical search, including tree-of-thoughts and variants of Monte Carlo tree search, let the model branch into multiple lines of reasoning, evaluate them and prune weak paths. These methods trade additional compute for more thorough exploration of the solution space.

Trade-offs

Test-time compute exposes a tunable trade-off between cost, latency and quality. Spending more compute improves results on hard reasoning, mathematics and coding tasks but increases response time and expense, so systems may allocate effort adaptively based on estimated difficulty. Research in 2025 also documented an over-reasoning effect, where excessive deliberation on easy questions wastes resources and can even degrade calibration. Designing models that decide how much to think remains an active research area.

Significance

The shift toward test-time compute has reshaped how the field thinks about progress. For much of the previous decade, gains came from scaling training. Inference-time scaling adds a complementary axis, with implications for hardware demand, since serving reasoning models requires more compute per query, and for the economics of deploying AI at scale.

Malaysian Context — Test-Time Compute and AI Infrastructure

The rise of test-time compute increases demand for inference-grade computing, a trend directly relevant to Malaysia's emergence as a regional data-centre hub. Large investments in data centres across Johor, Selangor, Cyberjaya and Kulai, by operators and hyperscalers including Microsoft, Google, Amazon Web Services and local players such as YTL Power's AI data-centre campus, position the country to host inference workloads for Southeast Asia.

Reasoning models that consume more compute per query raise the cost and energy footprint of AI services, an important consideration for Malaysia's national AI agenda and for sustainability goals advanced under the MyDIGITAL blueprint. The National AI Office, established to coordinate Malaysia's AI strategy, together with agencies such as MDEC and MIMOS, is concerned with ensuring access to affordable compute for local developers and enterprises.

For Malaysian sovereign language-model efforts such as MaLLaM and ILMU, test-time scaling offers a route to stronger reasoning in Malay and local languages without the expense of training ever-larger models, while the country's growing GPU capacity supports research at universities and start-ups within the regional AI ecosystem.

References

OpenAI. (2024). Learning to Reason with LLMs. o1 system documentation.
Snell, C., Lee, J., Xu, K. and Kumar, A. (2024). Scaling LLM Test-Time Compute Optimally Can Be More Effective Than Scaling Model Parameters. arXiv.
DeepSeek-AI. (2025). DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv.

Tags:reasoning models inference scaling chain-of-thought

Type	Inference-time scaling paradigm
Also called	Inference-time scaling
Key idea	Spend more compute when answering
Notable models	OpenAI o1/o3, DeepSeek-R1, Gemini reasoning
Related	Reasoning models, Chain-of-thought prompting