What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Phi (Language Model)

A family of small language models developed by Microsoft Research that demonstrate strong reasoning and instruction-following at parameter counts an order of magnitude smaller than typical frontier models.

4 min readLast updated May 2026Models

The Phi family is a series of small language models (SLMs) developed by Microsoft Research, designed to deliver competitive reasoning, mathematics and instruction-following performance at parameter counts substantially below those of typical frontier large language models. The series demonstrated that careful curation of "textbook-quality" training data and synthetic data generated by larger teacher models can compensate for raw scale, contributing to the broader 2024–2025 shift toward efficient SLMs for on-device and edge deployment.

Origins

The Phi line began with Phi-1, a 1.3-billion-parameter model released in June 2023 that targeted Python code generation and was trained on a curated mixture of high-quality web pages, programming exercises and synthetic textbook material. Phi-1 was followed by Phi-1.5 in September 2023 and Phi-2 in December 2023, the latter reaching 2.7 billion parameters and matching or exceeding several 13-billion-parameter models on common reasoning benchmarks. The 2024 Phi-3 family widened the lineup to include mini, small and medium variants, and introduced long-context versions reaching 128,000 tokens.

Phi-4 family

The Phi-4 generation, released across 2024 and 2025, comprises several variants. The base Phi-4 is a 14-billion-parameter dense model focused on complex reasoning, mathematics and coding. Phi-4-mini, announced on 26 February 2025, is a 3.8-billion-parameter text-only model optimised for low latency and edge deployment with a 128,000-token context window. Phi-4-multimodal, released alongside Phi-4-mini, is a 5.6-billion-parameter model that jointly processes speech, vision and text inputs through a unified architecture. Phi-4-reasoning and Phi-4-reasoning-plus, released on 30 April 2025, are 14-billion-parameter fine-tuned variants trained with supervised reasoning traces and, in the case of reasoning-plus, an additional outcome-based reinforcement learning phase. Both reportedly outperform OpenAI's o1-mini and DeepSeek-R1-Distill-Llama-70B on a range of mathematical and PhD-level science benchmarks.

Training methodology

The Phi series is closely associated with the "textbooks are all you need" hypothesis advanced by Microsoft Research, which argues that small models can match much larger ones when trained on data whose quality, pedagogical structure and diversity are tightly controlled. The training mixture combines filtered web crawl data, code, and synthetic data generated by larger frontier models acting as teachers. Reasoning-oriented Phi-4 variants additionally use long chain-of-thought traces, often distilled from frontier reasoning models.

Deployment and licensing

Phi models are distributed with permissive licences and are available through Azure AI Foundry, the Hugging Face Hub and ONNX Runtime. Quantised variants are tuned for deployment on NVIDIA GPUs, Apple Silicon, Qualcomm NPUs and other edge accelerators. The combination of small footprint and strong reasoning makes Phi a popular base for on-device agents, retrieval pipelines and private enterprise deployments.

Reception and significance

The Phi family is regularly cited as evidence that scaling is not the only path to capability and that data quality, instruction tuning and reinforcement learning post-training can shift the Pareto frontier of cost versus performance. The line has influenced competing open-weight efforts including Google's Gemma, Mistral's small models, Apple's foundation models and the broader proliferation of 1B–14B reasoning models released through 2025.

Malaysian Context — SLMs and sovereign AI deployment

Phi and other small language models are particularly relevant to Malaysian AI strategy because they fit within the cost, latency and data-sovereignty constraints that dominate procurement decisions in regulated industries. The National AI Office Malaysia and MDEC have repeatedly highlighted on-premise and edge deployment as priorities, both for compliance with the Personal Data Protection Act (PDPA) 2010 (as amended in 2024) and for resilience against cross-border data flow disruptions.

Domestic banks such as Maybank, CIMB and RHB face Bank Negara Malaysia (BNM) risk management guidelines on technology that constrain cloud usage for certain workloads. Small models in the 3–14 billion parameter range, including Phi-4 variants, can be hosted in domestic data centres operated by Telekom Malaysia (TM), Maxis, Time dotCom or YTL Communications, including the YTL AI Cloud built on NVIDIA H100 GPUs in Johor.

In the public sector, MAMPU, the Department of Statistics Malaysia (DOSM) and several state digital agencies are evaluating SLMs for Bahasa Malaysia document processing, multilingual citizen services in Tamil and Mandarin, and offline workflows in remote Sabah and Sarawak. The HRD Corp has funded technical training that includes deployment and fine-tuning of Phi-class models on Azure AI Foundry, leveraging Microsoft's Bandar Enstek data centre region announced in 2024.

References

Gunasekar, S. et al. (2023). Textbooks Are All You Need. Microsoft Research, arXiv:2306.11644.
Microsoft. (2025). Introducing Phi-4: Microsoft's Newest Small Language Model. Microsoft Tech Community, Azure AI Foundry Blog.
Microsoft. (2025). Phi-4-reasoning and Phi-4-reasoning-plus Technical Report. Microsoft Research.
Ministry of Science, Technology and Innovation Malaysia. (2021). National Artificial Intelligence Roadmap 2021–2025.

Tags:phi small language models microsoft reasoning

Developer	Microsoft Research
First released	June 2023 (Phi-1)
Latest	Phi-4 family (2025)
Parameter range	1.3B – 14B
Licence	MIT / open weights via Azure AI Foundry and Hugging Face
Related	Mistral, Llama, Qwen