Mixtral
Mixtral is a family of open-weight sparse mixture-of-experts large language models developed by Mistral AI, comprising Mixtral 8x7B and Mixtral 8x22B, released under the Apache 2.0 licence.
Mixtral is a family of large language models developed by the Paris-based AI laboratory Mistral AI. The models employ a sparse mixture-of-experts (SMoE) architecture, in which each transformer feed-forward block is replaced by eight parallel expert sub-networks and a learned router selects two experts per token. This design allows a model to have a large total parameter count while keeping the number of parameters activated for any single token relatively small, reducing both inference cost and memory bandwidth requirements compared with dense models of equivalent capability.
The first release, Mixtral 8x7B, appeared in December 2023 and contained approximately 47 billion total parameters with around 13 billion active per token. It demonstrated that an openly licensed SMoE model could match or exceed the performance of much larger dense models such as Llama 2 70B and GPT-3.5 on a range of standard benchmarks. The follow-up Mixtral 8x22B, announced in April 2024, scaled the design to 141 billion total parameters with 39 billion active parameters per token and a context window of 64K tokens.
Architecture
In a Mixtral layer, the standard transformer feed-forward network is replaced by a routing mechanism and a set of eight expert FFNs. For each input token, the router produces a probability distribution over experts; the top two experts are selected, their outputs are combined by a weighted sum, and the result is passed to the next layer. Other transformer components — embeddings, attention layers, and normalisation — are shared across all tokens. Because only two of eight experts are activated per token, the compute cost per forward pass is roughly that of a dense model with about a quarter of the total parameters.
This sparsity introduces engineering complexity. The total model weights must reside in GPU memory even though only a subset is used at each step, so the memory footprint resembles that of a dense model of full size. Batch routing and load balancing during training require auxiliary losses to ensure all experts are utilised, preventing degenerate solutions in which the router collapses to one or two experts.
Capabilities
Mixtral 8x22B supports native function calling and constrained output, features useful for building tool-using agents and structured-data pipelines. It is fluent in English, French, German, Spanish, and Italian, and supports mathematical reasoning and code generation. Reported benchmark figures from Mistral place it ahead of Command R+ and Llama 2 70B on MMLU, GSM8K, and HumanEval while requiring fewer active parameters per token.
| Model | Total params | Active params | Context | Released | |---|---|---|---|---| | Mixtral 8x7B | ~47B | ~13B | 32K | Dec 2023 | | Mixtral 8x22B | 141B | 39B | 64K | Apr 2024 |
The Apache 2.0 licence permits commercial use, modification, and redistribution, which has made Mixtral popular as a base for fine-tuned derivatives hosted on Hugging Face and as a self-hosted alternative to closed APIs in regulated industries.
Deployment
Mixtral models are available through the Mistral platform API and as raw weights for self-hosting. Hugging Face provides distribution mirrors and Transformer-compatible loaders. Quantised variants (4-bit, GPTQ, AWQ) reduce the memory footprint for deployment on smaller GPU clusters. The model has been integrated into AWS Bedrock, Azure AI Foundry, and Google Vertex AI managed services.
References
- Jiang, A. Q., et al. (2024). Mixtral of Experts. arXiv:2401.04088.
- Mistral AI. (2024). Cheaper, Better, Faster, Stronger — Continuing to push the frontier of AI and making it accessible to all. mistral.ai/news/mixtral-8x22b.
- Shazeer, N., et al. (2017). Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. arXiv:1701.06538.
- Bank Negara Malaysia. (2023). Risk Management in Technology (RMiT) Policy Document. https://www.bnm.gov.my.