Llama
Llama is a family of open-weight large language models developed by Meta AI, released under a permissive licence that allows researchers and developers to freely download, fine-tune, and deploy the models for both research and commercial use.
Llama (Large Language Model Meta AI) is a family of open-weight foundation models created by Meta AI, the artificial intelligence research division of Meta Platforms. Since the release of the first version in February 2023, Llama has become the most widely downloaded open-weight model family in the world, surpassing one billion cumulative downloads by 2025.[^1] Unlike proprietary systems such as GPT-4 or Gemini, Llama models are released with publicly accessible weights, enabling researchers, developers, and enterprises to inspect, fine-tune, and deploy them without licensing fees or dependence on a third-party API.
Background and Motivation
Prior to Llama's release, access to state-of-the-art language models was largely gated behind proprietary API agreements. Researchers who wished to study model behaviour, alignment properties, or failure modes had to work through opaque interfaces with no visibility into the underlying parameters. Meta's decision to publish model weights represented a deliberate philosophical stance: that the research community — and society more broadly — benefits from being able to audit and improve AI systems rather than treating them as black boxes.[^2]
The original Llama paper, published in February 2023, described models ranging from 7 billion to 65 billion parameters trained on approximately 1.4 trillion tokens drawn from publicly available text corpora.[^3] Crucially, the researchers demonstrated that a carefully curated dataset and efficient training procedure could produce a 13-billion-parameter model that matched or exceeded the performance of GPT-3 (175 billion parameters) on several benchmarks, establishing that scale was not the only route to capable models.
Model Generations
Llama 1 (2023)
The original Llama release offered models with 7B, 13B, 33B, and 65B parameters. It was initially distributed under a research-only licence, limiting commercial deployment. Despite this restriction, the weights were leaked online within days of the initial release, accelerating community fine-tuning efforts and producing a large ecosystem of derivative models including Alpaca, Vicuna, and WizardLM.
Llama 2 (July 2023)
Meta revised its licensing approach with Llama 2, releasing models under a permissive commercial licence that allowed most organisations to use the weights in production applications, subject to usage policies that restricted deployment by services with more than 700 million monthly active users (targeting large competitors rather than typical enterprises). Llama 2 introduced models at 7B, 13B, and 70B parameter sizes, along with fine-tuned chat variants optimised using Reinforcement Learning from Human Feedback (RLHF).
Llama 3 (April 2024)
Llama 3 introduced significantly improved pre-training data quality — approximately 15 trillion tokens, roughly ten times the Llama 2 dataset — and architectural improvements to the tokeniser (expanding the vocabulary from 32,000 to 128,000 tokens). The 8B and 70B base and instruction-tuned variants demonstrated performance competitive with closed models on standard benchmarks including MMLU, HumanEval, and GSM8K. A 405-billion-parameter Llama 3.1 variant was subsequently released, representing the largest open-weight model publicly available at that time.
Llama 4 (April 2025)
Llama 4 marked Meta's transition to natively multimodal architectures. The release comprised three models built on a Mixture of Experts (MoE) framework: Scout (17B active / 109B total parameters, 10 million token context window), Maverick (17B active / 400B total parameters, 1 million token context window), and Behemoth (288B active / 2 trillion total parameters, not yet publicly released as of mid-2025). All Llama 4 models were trained on large quantities of unlabelled text, image, and video data spanning 200 languages, giving them broad visual understanding alongside strong language capabilities.[^4]
Architecture and Training
Llama models use a decoder-only transformer architecture with several modifications relative to the original 2017 design: pre-normalisation using RMSNorm (for training stability), rotary positional embeddings (RoPE) for improved generalisation across sequence lengths, and grouped-query attention (GQA) in later versions to reduce memory bandwidth requirements during inference. The Llama 4 generation adopted a Mixture of Experts design, in which each token is routed to a small subset of specialist sub-networks rather than passing through all parameters, reducing effective compute per forward pass while increasing total model capacity.
Safety Infrastructure
Meta distributes a suite of companion tools alongside the Llama weights. Llama Guard is a fine-tuned classifier designed to detect policy-violating content in both prompts and model responses across categories including violence, hate speech, sexual content, and dangerous instructions. Prompt Guard is a separate model that identifies prompt injection attempts, in which adversarial content embedded in external data sources attempts to hijack the model's behaviour. CyberSecEval provides structured benchmarks for evaluating model vulnerability to cybersecurity misuse.
Ecosystem and Derivative Models
The open-weight nature of Llama has catalysed one of the largest AI ecosystems outside of proprietary platforms. Hugging Face hosts thousands of Llama-derived fine-tunes spanning domains including medicine, law, coding, and instruction following. Enterprises have used LoRA and QLoRA techniques to adapt Llama for private data at a fraction of the cost of training from scratch. Deployment frameworks such as Ollama, llama.cpp, and vLLM allow the models to run on consumer hardware or be served at scale on cloud infrastructure.
References
- Meta AI. (2025). Llama 4: The Beginning of a New Era of Natively Multimodal AI Innovation. Meta AI Blog.
- Touvron, H., et al. (2023). LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971.
- Touvron, H., et al. (2023). LLaMA: Open and Efficient Foundation Language Models. Meta AI Research.
- Meta AI. (2025). Llama 4 Technical Report. Meta Platforms.