What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Gemma

Gemma is a family of open-weight large language models developed by Google DeepMind, built on similar technology to the Gemini series and available for deployment on hardware ranging from laptops to cloud infrastructure.

5 min readLast updated June 2026Models

Gemma is a family of open-weight large language models released by Google DeepMind beginning in February 2024. Built on the same research and infrastructure that underpins Google's Gemini series, Gemma models are designed to be lightweight enough for deployment on consumer hardware — including laptops and edge devices — while remaining competitive with larger proprietary systems on a range of benchmarks. The models are released with open weights under a custom Gemma Terms of Use licence that permits research and commercial use subject to certain restrictions.

Development and Release History

Google DeepMind introduced the original Gemma in February 2024 with two model sizes: a 2-billion-parameter variant and a 7-billion-parameter variant. Both were offered in pre-trained and instruction-tuned forms. The instruction-tuned versions follow conversational prompts and are suitable for direct use in chat applications, while the pre-trained versions are intended for further fine-tuning on specialised tasks.

Gemma 2 followed in June 2024, initially with 9B and 27B parameter variants, and expanded to include a 2B variant in July 2024. Google claimed that the 27B model outperformed substantially larger open models on several standard evaluation benchmarks.

Gemma 3 debuted in March 2025 with four parameter sizes: 1B, 4B, 12B, and 27B. At launch, Google asserted that Gemma 3 outperformed competing open-source models including DeepSeek-V3 and Llama 3 405B on a subset of reasoning and coding benchmarks.

Gemma 4, released in April 2026, marked a significant expansion of the family's capabilities. Gemma 4 models are natively multimodal, accepting both text and image input and generating text output. The release includes models in four configurations: an effective 2B (E2B) variant, an effective 4B (E4B) variant, a 26B Mixture of Experts (MoE) variant, and a 31B dense model. Gemma 4 models support a context window of up to 256,000 tokens and over 140 languages.

Architecture and Design

Gemma models are based on the transformer decoder architecture with modifications drawn from Google's internal research. Key design choices include the use of multi-query attention to reduce memory bandwidth requirements during inference, rotary positional embeddings (RoPE) for improved length generalisation, and GeGLU activations in the feed-forward layers. These choices collectively improve inference efficiency on consumer-grade GPUs and CPUs.

The Gemma vocabulary is shared with the Gemini model family, enabling straightforward transfer of tokenisation pipelines and embedding initialisations between the two product lines.

Model Variants

The Gemma family includes several specialised derivatives beyond the core instruction-tuned and pre-trained variants. CodeGemma is optimised for code completion and generation tasks and is available in 2B and 7B sizes. PaliGemma is a vision-language variant that combines Gemma language components with a SigLIP vision encoder, enabling image captioning, visual question answering, and object detection. RecurrentGemma experiments with linear recurrent architecture alternatives to full attention for long-context tasks.

Ecosystem and Deployment

Gemma models are supported across the major ML frameworks, including Hugging Face Transformers, JAX, PyTorch, and TensorFlow. Google provides optimised inference kernels for its own hardware (TPUs) as well as for NVIDIA GPUs. The models can be run locally using tools such as Ollama and LM Studio, making them accessible to individual developers without cloud API costs.

Performance and Benchmarks

A central claim for the Gemma series has been strong performance relative to parameter count — sometimes described as being competitive with models two to four times larger. Gemma 3 27B, for instance, was reported to score comparably to models in the 70B range on the MMLU (Massive Multitask Language Understanding) benchmark and outperform certain 70B models on mathematical reasoning evaluations. These results reflect both architectural refinements and the quality of Google DeepMind's training data curation and filtering processes.

Malaysian Context — Gemma in Local AI Development

Google's Gemma family has attracted attention from Malaysian AI practitioners and researchers because the open-weight release allows fine-tuning on proprietary datasets without sending sensitive data to a third-party API. This is particularly relevant for organisations subject to Malaysia's Personal Data Protection Act (PDPA), which governs the handling of personal data and has prompted many local enterprises to explore on-premise or self-hosted model deployments.

Malaysian universities — including Universiti Malaya, Universiti Teknologi Malaysia, and Universiti Sains Malaysia — have incorporated Gemma into research projects on Bahasa Malaysia NLP, exploring instruction fine-tuning on locally curated Malay-language corpora. The relatively small footprint of the 2B and 4B Gemma variants makes them suitable for deployment on university-grade GPU infrastructure without the capital expenditure required for larger model hosting.

MDEC's AI acceleration programmes and the MyDigital Blueprint's emphasis on building domestic AI talent have created demand for accessible open models that Malaysian developers can study, modify, and deploy. Gemma's permissive research licence, comprehensive documentation, and integration with Hugging Face lower the barrier to participation. Local AI startups in Penang, Kuala Lumpur, and Cyberjaya have used Gemma as a foundation for vertical applications in areas such as Bahasa Malaysia customer support automation, legal document summarisation, and educational tutoring.

Google Malaysia maintains an active developer relations function that has promoted Gemma and other Google AI tools through events such as Google I/O Extended Kuala Lumpur and partnerships with Sunway University, Taylor's University, and other institutions participating in the Google Developer Expert programme. These channels have contributed to Gemma's adoption among Malaysian developers who are already embedded in Google's ecosystem.

References

Google DeepMind. (2024). Gemma: Introducing new state-of-the-art open models. Google Blog.
Google DeepMind. (2025). Gemma 3 model card. Google AI for Developers.
Google DeepMind. (2026). Gemma 4: Byte for byte, the most capable open models. Google Blog.
Team, G. et al. (2024). Gemma: Open Models Based on Gemini Research and Technology. Google DeepMind Technical Report.
Wikipedia. (2026). Gemma (language model). Wikimedia Foundation.

Tags:gemma google deepmind open source language model

Developed by	Google DeepMind
Initial release	February 2024
Latest version	Gemma 4 (April 2026)
Licence	Open weights (Gemma Terms of Use)
Architecture	Transformer decoder
Key feature	Lightweight, deployable on consumer hardware