DeepSeek
A Chinese artificial intelligence company founded in 2023, known for developing open-source large language models including DeepSeek-R1 and DeepSeek-V3 that achieved performance competitive with leading Western AI systems.
DeepSeek is a Chinese artificial intelligence research company headquartered in Hangzhou, China. It was founded in July 2023 by Liang Wenfeng, the co-founder and chief investment officer of High-Flyer, a quantitative hedge fund. Despite its relatively brief existence, DeepSeek attracted global attention in early 2025 when it released a family of open-source large language models that demonstrated performance competitive with—and in some benchmarks exceeding—models from OpenAI and Anthropic, while reporting significantly lower training costs and operating under export-restricted hardware constraints.
DeepSeek's emergence reshaped discussions about the economics of frontier AI development and the extent to which leading capabilities could be reproduced outside the United States, triggering substantial movements in technology equity markets and prompting policy debate in several governments.
Background and Founding
High-Flyer, DeepSeek's parent company, had already built a significant GPU cluster for its quantitative trading operations before pivoting into AI research. Liang Wenfeng is reported to have founded DeepSeek as a dedicated AI research lab within this infrastructure context, motivated by both commercial interest and a view that China needed to develop independent AI capabilities. DeepSeek is positioned as a core component of China's broader strategy to develop advanced AI without dependence on foreign technology supply chains.[^1]
Key Models
DeepSeek-V3
Released in late 2024, DeepSeek-V3 is a Mixture-of-Experts (MoE) model with 685 billion total parameters. It uses a sparse architecture in which only a subset of parameters are activated for any given input, improving computational efficiency. DeepSeek reported that V3 was trained at a fraction of the cost of comparable Western models, a claim that attracted both significant attention and scrutiny. The model's performance on coding, mathematics, and reasoning benchmarks was competitive with GPT-4o and Claude 3.5 Sonnet.[^2]
DeepSeek-R1
Unveiled on 20 January 2025, DeepSeek-R1 is an open-source reasoning model with 671 billion parameters. It was trained using reinforcement learning to generate extended chain-of-thought reasoning traces before producing final answers, a methodology similar to OpenAI's o1 series. On the AIME (American Invitational Mathematics Examination) benchmark, R1 achieved 79.8% Pass@1, and on the MATH benchmark it scored 97.4%, results that the research community widely acknowledged as surpassing or matching OpenAI's o1 model at the time of release.[^3]
R1 was released under an MIT licence, permitting free commercial and academic use. This open-source strategy, unusual for a frontier model, drove rapid adoption: R1 became one of the most downloaded models on Hugging Face within weeks of release.
DeepSeek-V3.1 and V4
DeepSeek-V3.1, released in August 2025, is a hybrid model combining elements of V3 and R1—capable of deep reasoning when required while still responding quickly to straightforward queries. DeepSeek-V4, previewed in April 2026, was released in two variants: V4-Pro (described by the company as rivalling the world's top closed-source models) and V4-Flash (a smaller, cheaper variant designed for cost-sensitive applications). V4 is notably optimised for Huawei's Ascend chips, a strategic choice that reduces dependence on Nvidia hardware subject to US export controls.[^4]
Technical Innovations
DeepSeek has introduced several technical contributions beyond raw benchmark performance. Its Multi-head Latent Attention (MLA) mechanism reduces the key-value cache size during inference, lowering memory costs significantly. DeepSeekMoE proposes a finer-grained MoE architecture with more specialised expert routing compared with standard implementations. The FP8 mixed-precision training framework allows training on a broader range of hardware and reduces memory bandwidth requirements. These innovations suggest a systematic engineering approach oriented towards efficiency under compute constraints.
Geopolitical Context
DeepSeek's rise has been closely watched in the context of US–China technology competition. US export controls on high-performance AI chips (particularly Nvidia's H100 and successor GPUs) were intended to slow Chinese AI development; DeepSeek's apparent ability to produce competitive models despite these restrictions prompted debate about the effectiveness of such policies. Simultaneously, DeepSeek's open-source release of R1 was welcomed by the global AI community as a democratising development but raised security concerns in some governments about data privacy and the Chinese legal environment governing domestic AI companies.
See Also
References
References
- TechTarget. (2025). DeepSeek explained: Everything you need to know. https://www.techtarget.com/whatis/feature/DeepSeek-explained-Everything-you-need-to-know
- Wikipedia. (2025). DeepSeek. https://en.wikipedia.org/wiki/DeepSeek
- Built In. (2025). What Is DeepSeek-R1? https://builtin.com/artificial-intelligence/deepseek-r1
- Fortune. (2026). DeepSeek unveils V4 model, with rock-bottom prices and close integration with Huawei's chips. https://fortune.com/2026/04/24/deepseek-v4-ai-model-price-performance-china-open-source/
- The Edge Malaysia. (2025). Is Malaysia ready to seize on China's open-source AI revolution? https://theedgemalaysia.com/node/754593