What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Small Language Models

Small language models (SLMs) are compact language models with fewer than around 10 billion parameters, designed for efficient deployment on edge devices, mobile hardware, and resource-constrained environments.

6 min readLast updated June 2026Models

Small language models (SLMs) are a category of language model characterised by a comparatively low parameter count — typically below 10 billion parameters — that achieves strong performance on a broad range of tasks while remaining feasible to deploy on consumer hardware, mobile devices, and industrial edge systems without cloud connectivity. The category emerged as a response to the cost, latency, and privacy limitations of large cloud-hosted language models, and has grown rapidly since 2023 as training methodology advances allowed smaller models to approach or match the quality of much larger predecessors.

The boundary between "small" and "large" is not formally defined and has shifted over time. Models that were considered large in 2020, such as GPT-2 at 1.5 billion parameters, are now firmly in the small category. In 2025 and 2026, the practical threshold for SLMs is often placed at 7 billion parameters, although sub-billion-parameter models have demonstrated utility on highly focused tasks.

Why Small Models Matter

The dominance of large language models such as GPT-4, Claude, and Gemini Ultra has obscured an important practical reality: most enterprise and consumer AI tasks do not require frontier-scale reasoning. Summarising a customer support ticket, classifying product reviews, extracting structured data from a document, or answering domain-specific questions can often be performed with high accuracy by a well-trained 3-7 billion parameter model.

Deploying a smaller model carries several advantages. Inference cost is dramatically reduced — a 3.8B parameter model running on a consumer GPU processes tokens an order of magnitude cheaper than a 70B model served from a cloud cluster. Latency improves because there is no network round-trip and the model fits entirely in local memory. Privacy is preserved because sensitive data never leaves the device. Offline reliability is guaranteed in environments without internet access, including industrial plants, aircraft, remote field operations, and healthcare facilities.

Key Models

Microsoft Phi Series

The Phi series, developed by Microsoft Research, demonstrated that careful curation of training data could yield models that punch well above their weight. Phi-1 (2023) achieved state-of-the-art results on Python coding benchmarks despite having only 1.3 billion parameters, trained on a corpus of "textbook-quality" synthetic data rather than noisy web crawls.

Phi-3 (2024) extended this approach with 3.8 billion parameters and delivered performance comparable to GPT-3.5 on standard benchmarks. Phi-3-mini, at 3.8B, fits comfortably within the 4GB memory envelope of many consumer smartphones. Phi-4 (2025) pushed further, introducing multimodal capabilities and an extended context window while remaining under 15 billion parameters.

Google Gemma Series

Gemma, released by Google DeepMind in 2024, is a family of open-weight models trained on the same infrastructure used for Gemini. Gemma 2 offered 2B and 9B variants with strong reasoning and instruction-following capability. Gemma 3 (2025) introduced a 128K token context window and multimodal input handling, setting new benchmarks for the under-10B parameter class.

Meta Llama 3.2

Llama 3.2, released in September 2024, included 1B and 3B parameter variants specifically designed for mobile deployment. These models were distilled from larger Llama 3.1 variants and achieved near-parity on many tasks. Meta released these models under a permissive licence, enabling wide commercial deployment.

Qwen and Other Chinese SLMs

Alibaba's Qwen series has produced competitive small models, including Qwen2-1.5B and Qwen2-7B, with strong multilingual capability across Chinese, Malay, Indonesian, and other Southeast Asian languages. This makes them particularly relevant for regional deployment.

Training Techniques for Small Models

Several techniques enable small models to achieve quality beyond what raw parameter count would suggest.

Synthetic data training uses AI-generated "textbook" content that is dense, factually accurate, and pedagogically structured. Microsoft's Phi series relies heavily on this approach, which focuses learning capacity on signal-rich examples rather than low-quality web noise.

Distillation transfers knowledge from a large teacher model to a smaller student model by training the student to reproduce not just the correct output but the probability distribution over outputs from the teacher. Distillation consistently improves small model quality at minimal additional training cost.

Quantisation reduces the numerical precision of model weights from 32-bit or 16-bit floating point to 8-bit integers or 4-bit representations. A 7B model quantised to 4-bit occupies roughly 4GB of memory, fitting on a single consumer GPU or a high-end smartphone. Frameworks such as GGUF, llama.cpp, and Apple's Core ML enable quantised SLM inference on a wide range of hardware.

Instruction tuning and RLHF align small models with human preferences and specific task formats, improving usability without requiring additional parameters.

Use Cases

On-device personal assistants, offline document processing, embedded quality control in manufacturing, diagnostic support in rural healthcare facilities, and real-time translation on consumer devices are among the most common SLM deployment scenarios. In enterprise settings, SLMs are frequently fine-tuned on domain-specific corpora — legal documents, engineering manuals, or medical records — to create specialised models that outperform general-purpose large models on narrow tasks.

Malaysian Context — SLMs for Localisation and Edge Deployment

Small language models have attracted particular interest in Malaysia because of their potential to run affordably on local infrastructure without dependency on overseas cloud providers. The MyDigital Blueprint, announced by the Malaysian government in 2022 and updated in 2024, emphasises data sovereignty and locally hosted AI services for the public sector — goals that SLMs are well positioned to support.

MDEC (Malaysia Digital Economy Corporation) has advocated for SLMs as part of its AI democratisation agenda, highlighting that smaller models can be fine-tuned by Malaysian technology companies on Bahasa Malaysia corpora to build locally relevant products. Startups such as Mesolitica have released open Bahasa Malaysia language models in the 1B-7B parameter range, enabling local fine-tuning without reliance on English-centric frontier models.

In Malaysia's manufacturing sector — particularly the Penang electronics and semiconductor cluster — SLMs are deployed on factory floors for real-time defect classification, equipment diagnostics, and maintenance documentation. Panasonic, Intel, and Bosch facilities in Malaysia have piloted on-device AI inference, citing data security and sub-50 millisecond latency requirements that rule out cloud LLM APIs.

The Multimedia Super Corridor (MSC Malaysia) community has seen a wave of SLM-based product development, including Bahasa Malaysia chatbots for customer service, automated Malay-language document extraction for legal and government applications, and speech-to-text tools tuned for Malaysian accents. Companies such as Fusionex, Silverlake Axis, and a growing cohort of MSC-status startups have deployed or are evaluating SLMs as an alternative to cloud API costs.

HRD Corp-registered training providers have begun offering SLM fine-tuning courses that cover quantisation, deployment with llama.cpp, and domain adaptation, responding to employer demand for practitioners who can build and maintain on-premise language AI without cloud dependency.

References

Gunasekar, S., et al. (2023). Textbooks Are All You Need. arXiv:2306.11644.
Abdin, M., et al. (2024). Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone. arXiv:2404.14219.
Google DeepMind. (2025). Gemma 3 Model Card. ai.google.dev/gemma.
Meta AI. (2024). Llama 3.2: Lightweight Models for Mobile and Edge. ai.meta.com.
IBM. (2025). What Are Small Language Models?. ibm.com/think/topics/small-language-models.

Tags:small language models SLM edge AI Phi Gemma on-device AI

Type	Compact generative language models
Parameter range	Under 10 billion parameters (typically 1B-7B)
Key examples	Phi-3, Phi-4, Gemma 3, Llama 3.2
Developed by	Microsoft, Google, Meta and others
Key use	Edge inference, on-device AI, offline applications
Related	Large language models, TinyML, Edge AI, quantisation