What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

SEA-LION

SEA-LION (Southeast Asian Languages In One Network) is an open-source family of large language models developed by AI Singapore to serve the languages and cultures of Southeast Asia.

5 min readLast updated June 2026Models

SEA-LION, short for Southeast Asian Languages In One Network, is an open-source family of large language models developed by AI Singapore (AISG) and released from 2023 onwards. The project is purpose-built to represent the languages, scripts, and cultural contexts of Southeast Asia, a region whose languages are under-represented in the training data of most globally dominant models. Rather than competing with general-purpose systems such as GPT-4, Claude, or Gemini on broad capability, SEA-LION occupies a gap that large international developers have limited incentive to fill and that most regional organisations lack the compute to address independently.

Languages and training data

SEA-LION centres its training on eleven Southeast Asian languages, including English, Chinese, Malay, Indonesian, Thai, Vietnamese, Filipino, Tamil, Burmese, Khmer, and Lao. The pretraining corpus comprises roughly one trillion tokens, with Southeast Asian languages deliberately over-represented relative to their share in conventional web-scraped datasets. This design choice improves the model's fluency, cultural grounding, and handling of code-switching, a common feature of everyday communication across the region.

To support this work, AISG also produced the Southeast Asian Languages in One Network Data (SEALD) collection, a curated and cleaned multilingual dataset assembled in collaboration with regional partners. High-quality regional data is the principal constraint on building models of this kind, and dataset construction is therefore a core part of the SEA-LION programme rather than an afterthought.

Model versions and architecture

The SEA-LION family has progressed through several generations. Early versions used an in-house transformer architecture trained from scratch. From version 3 onwards, AISG adopted a continued-pretraining strategy built on strong open base models, including Meta's Llama 3 and Google's Gemma, which are further trained on the SEA-LION corpus to inject regional language competence. This approach allows the project to benefit from the engineering investment behind frontier open models while concentrating its own resources on regional adaptation.

The family spans multiple sizes, typically around 3 billion, 7 to 9 billion, and 70 billion parameters, and multiple variants including base models, instruction-tuned chat models, and configurations adapted for retrieval-augmented generation. By 2026, the flagship 70B model represented one of the most capable openly available foundations oriented specifically toward Southeast Asia.

Availability and use

SEA-LION models are distributed openly. Weights are published on Hugging Face and can also be accessed through the official sea-lion.ai application programming interface. The permissive licensing, generally MIT or Apache 2.0 depending on the base model, makes the models suitable for commercial deployment as well as research. Typical applications include multilingual chatbots, government and public-service tools, translation and summarisation for regional languages, and as a base for organisations that wish to fine-tune a model on their own local data.

Significance for the region

SEA-LION is frequently described as one of the first genuinely Southeast-Asia-oriented open large language model foundations. Its importance lies less in raw benchmark scores than in linguistic inclusion: by treating low-resource regional languages as first-class rather than as a long tail, the project provides a shared infrastructure that national initiatives, universities, and companies across ASEAN can build on. A related effort, SeaLLM, pursues similar goals, and the two are often discussed together as anchors of a regional model ecosystem.

Malaysian Context — Regional Models and Malay-Language AI

SEA-LION is directly relevant to Malaysia because Malay (Bahasa Melayu) is one of its core supported languages, and the model's emphasis on regional code-switching reflects how Malaysians actually communicate across Malay, English, Mandarin, and Tamil. Organisations seeking to deploy AI for Malaysian audiences can use SEA-LION as a base rather than relying solely on English-centric models that handle Malay less reliably.

Malaysia has parallel national efforts that complement regional models. MIMOS, the national applied research agency under the Ministry of Science, Technology and Innovation (MOSTI), and locally developed Malay-language models such as MaLLaM and ILMU reflect a broader push toward sovereign and regionally grounded AI. The National AI Office, established to coordinate Malaysia's AI strategy, has emphasised local language capability and data sovereignty as priorities.

Government bodies including MDEC (Malaysia Digital Economy Corporation) and initiatives under the MyDigital Blueprint encourage adoption of AI that serves Malaysian linguistic and cultural needs. For public-sector services, where citizens interact in Malay, models with strong regional grounding reduce the risk of mistranslation and cultural misalignment. Talent-development programmes through HRD Corp and university research groups also benefit from an open, regionally relevant foundation that students and engineers can study and adapt without licensing barriers.

Because SEA-LION is openly licensed, Malaysian firms in regulated sectors such as banking (Maybank, CIMB) and telecommunications (TM, Maxis) can in principle self-host the models within their own infrastructure, addressing data-residency concerns raised under the Personal Data Protection Act (PDPA).

References

AI Singapore. (2024). SEA-LION: Southeast Asian Languages In One Network. https://sea-lion.ai/
AI Singapore. (2024). SEA-LION GitHub Repository. https://github.com/aisingapore/sealion
NVIDIA Developer Blog. (2024). Regional LLMs SEA-LION and SeaLLM Serve Languages and Cultures of Southeast Asia.
Computer Weekly. (2024). Sea-Lion explained: Southeast Asia's first large language model.

Tags:sea-lion large-language-models southeast-asia malay open-source

Type	Large language model family
Developed by	AI Singapore (AISG)
First released	2023
Latest	SEA-LION v3 (up to 70B parameters)
Base models	In-house, then Llama 3 and Gemma
Licence	Open-source (MIT / Apache 2.0)