Cerebras Systems
An American AI hardware company that builds wafer-scale processors, using an entire silicon wafer as a single chip to accelerate deep learning training and to deliver very high-speed large language model inference.
Cerebras Systems is an American semiconductor and artificial intelligence company headquartered in Sunnyvale, California, known for building the largest computer chips in the industry. Rather than dividing a silicon wafer into many small chips in the conventional way, Cerebras manufactures a single processor that occupies almost an entire wafer. This wafer-scale approach is designed to accelerate the training of deep learning models and, more recently, to deliver very high-speed inference for large language models.
Wafer-scale engine
The company's flagship product is the Wafer Scale Engine, or WSE. A standard chip fabrication process produces many separate dies from one circular silicon wafer, which are then cut apart and packaged individually. Cerebras instead keeps the wafer intact and treats it as one enormous processor, connecting its many cores with high-bandwidth on-chip links. The advantage is that data can move between cores without leaving the chip, avoiding the slower and more power-hungry communication that limits clusters of separate processors.
The third-generation WSE-3, built on the TSMC 5-nanometre process, integrates about 4 trillion transistors, roughly 900,000 AI-optimised cores, and 44 gigabytes of on-chip memory. It is rated at around 125 petaflops of peak performance and offers extremely high on-chip memory bandwidth. Keeping large amounts of memory on the same piece of silicon as the compute cores is central to the design, because memory bandwidth is often the true bottleneck in AI workloads.
From training to inference
Cerebras originally positioned wafer-scale hardware for training, where the ability to hold a large model on one chip simplifies programming compared with distributing it across many GPUs. More recently the company has emphasised inference, marketing what it describes as some of the fastest large language model serving available. By holding model weights in fast on-chip memory, Cerebras systems can generate output tokens at rates well above typical GPU-based systems. The company has reported serving large open-weight models at thousands of tokens per second per user, figures that in its published benchmarks exceed those of contemporary flagship GPU systems on the same models.
The single-chip Wafer Scale Engine is packaged into a computer system called the CS-3, and multiple CS-3 units can be combined for larger workloads. Cerebras also offers access to its hardware through a cloud inference service, so customers can use the speed advantage without purchasing systems outright.
Position in the market
Cerebras is one of several companies challenging the dominance of conventional GPU clusters in AI computing, competing with specialised inference providers and with the mainstream accelerator market. Its distinctive bet is that radical integration at the wafer level, rather than networking many smaller chips, is the more efficient path for certain AI workloads. The main trade-offs are manufacturing complexity, the cost of a wafer-scale device, and the need for software that maps models onto an unusual architecture. Despite these challenges, the company has attracted significant customers in research, supercomputing and enterprise inference.
References
- Cerebras Systems. (2024). Cerebras Launches the World's Fastest AI Inference. cerebras.ai.
- Wikipedia contributors. (2025). Cerebras. en.wikipedia.org.
- Cerebras Systems. (2024). WSE-3 Product Overview. cerebras.ai.