What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Mixtral

Mixtral is a family of open-weight sparse mixture-of-experts large language models developed by Mistral AI, comprising Mixtral 8x7B and Mixtral 8x22B, released under the Apache 2.0 licence.

5 min readLast updated May 2026Models

Mixtral is a family of large language models developed by the Paris-based AI laboratory Mistral AI. The models employ a sparse mixture-of-experts (SMoE) architecture, in which each transformer feed-forward block is replaced by eight parallel expert sub-networks and a learned router selects two experts per token. This design allows a model to have a large total parameter count while keeping the number of parameters activated for any single token relatively small, reducing both inference cost and memory bandwidth requirements compared with dense models of equivalent capability.

The first release, Mixtral 8x7B, appeared in December 2023 and contained approximately 47 billion total parameters with around 13 billion active per token. It demonstrated that an openly licensed SMoE model could match or exceed the performance of much larger dense models such as Llama 2 70B and GPT-3.5 on a range of standard benchmarks. The follow-up Mixtral 8x22B, announced in April 2024, scaled the design to 141 billion total parameters with 39 billion active parameters per token and a context window of 64K tokens.

Architecture

In a Mixtral layer, the standard transformer feed-forward network is replaced by a routing mechanism and a set of eight expert FFNs. For each input token, the router produces a probability distribution over experts; the top two experts are selected, their outputs are combined by a weighted sum, and the result is passed to the next layer. Other transformer components — embeddings, attention layers, and normalisation — are shared across all tokens. Because only two of eight experts are activated per token, the compute cost per forward pass is roughly that of a dense model with about a quarter of the total parameters.

This sparsity introduces engineering complexity. The total model weights must reside in GPU memory even though only a subset is used at each step, so the memory footprint resembles that of a dense model of full size. Batch routing and load balancing during training require auxiliary losses to ensure all experts are utilised, preventing degenerate solutions in which the router collapses to one or two experts.

Capabilities

Mixtral 8x22B supports native function calling and constrained output, features useful for building tool-using agents and structured-data pipelines. It is fluent in English, French, German, Spanish, and Italian, and supports mathematical reasoning and code generation. Reported benchmark figures from Mistral place it ahead of Command R+ and Llama 2 70B on MMLU, GSM8K, and HumanEval while requiring fewer active parameters per token.

| Model | Total params | Active params | Context | Released | |---|---|---|---|---| | Mixtral 8x7B | ~47B | ~13B | 32K | Dec 2023 | | Mixtral 8x22B | 141B | 39B | 64K | Apr 2024 |

The Apache 2.0 licence permits commercial use, modification, and redistribution, which has made Mixtral popular as a base for fine-tuned derivatives hosted on Hugging Face and as a self-hosted alternative to closed APIs in regulated industries.

Deployment

Mixtral models are available through the Mistral platform API and as raw weights for self-hosting. Hugging Face provides distribution mirrors and Transformer-compatible loaders. Quantised variants (4-bit, GPTQ, AWQ) reduce the memory footprint for deployment on smaller GPU clusters. The model has been integrated into AWS Bedrock, Azure AI Foundry, and Google Vertex AI managed services.

Malaysian Context — Mixtral in Local Deployments

Mixtral has been adopted by Malaysian organisations seeking open-weight alternatives to closed proprietary models, particularly in sectors with data-residency or regulatory constraints. Bank Negara Malaysia (BNM) Risk Management in Technology (RMiT) policy and the Personal Data Protection Act (PDPA) push regulated entities toward models that can be run within Malaysian or ASEAN jurisdictions; self-hosted Mixtral instances on local cloud providers such as TM ONE, YTL Data Center, and AIMS Cyberjaya meet this requirement.

Several systems integrators in the Cyberjaya and Bayan Lepas technology corridors offer Mixtral-based retrieval-augmented generation deployments for legal, healthcare, and government customers. These deployments are typically hosted on NVIDIA H100 or H200 clusters operated by domestic data centre providers, with the model weights residing entirely within Malaysian infrastructure.

Universiti Malaya, Universiti Sains Malaysia, and Multimedia University have used Mixtral 8x7B and 8x22B for academic research on Malay-English code-switching, Bahasa Malaysia summarisation, and Southeast Asian language adaptation. The MyDIGITAL Corporation and the National AI Office, established in December 2024, have profiled open-weight LLMs including Mixtral in policy documents on sovereign AI capability.

For startups, Mixtral lowers the cost of building AI products under the MDEC Digital Catalyst and Pemangkin programmes. AITG Sdn Bhd and other domestic platform providers route some inference workloads through Mixtral as a fallback or cost-optimised tier alongside Claude, Gemini, and GPT-class models.

References

Jiang, A. Q., et al. (2024). Mixtral of Experts. arXiv:2401.04088.
Mistral AI. (2024). Cheaper, Better, Faster, Stronger — Continuing to push the frontier of AI and making it accessible to all. mistral.ai/news/mixtral-8x22b.
Shazeer, N., et al. (2017). Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. arXiv:1701.06538.
Bank Negara Malaysia. (2023). Risk Management in Technology (RMiT) Policy Document. https://www.bnm.gov.my.

Tags:mistral-ai mixture-of-experts open-source large-language-model

Developer	Mistral AI (France)
First release	Mixtral 8x7B — December 2023
Architecture	Sparse Mixture of Experts (SMoE)
Largest model	Mixtral 8x22B (141B total, 39B active)
Context window	32K (8x7B), 64K (8x22B)
Licence	Apache 2.0