Tensor Processing Unit
A tensor processing unit (TPU) is a custom application-specific integrated circuit developed by Google for accelerating machine learning workloads, particularly neural network training and inference.
A tensor processing unit (TPU) is a custom application-specific integrated circuit (ASIC) designed by Google to accelerate the matrix-heavy computations that dominate modern neural networks. Unlike general-purpose central processing units (CPUs) or graphics processing units (GPUs), TPUs are purpose-built for the linear algebra operations underpinning deep learning, including large-scale matrix multiplication, convolution, and embedding lookups.
History and generations
Google began deploying internal TPUs in 2015 to handle workloads such as Google Search ranking, Translate, and Photos. The first generation (TPU v1) was unveiled at Google I/O 2016 and was inference-only. Each successive generation has expanded both training and inference capacity. TPU v2 (2017) introduced training support and bfloat16 arithmetic. TPU v3 (2018) added liquid cooling. TPU v4 (2021) introduced optical circuit switches for pod-level interconnect. TPU v5e and v5p (2023) targeted cost-efficient inference and high-end training respectively. The sixth-generation Trillium (TPU v6e) reached general availability in 2024, delivering approximately 4.7 times the peak compute of v5e and doubling high-bandwidth memory capacity. In April 2025, Google unveiled Ironwood (TPU v7), available in 256-chip and 9,216-chip pod configurations and optimised for inference-heavy frontier model workloads.
Architecture
The core of a TPU is a systolic array, a two-dimensional grid of multiply-accumulate units that perform matrix multiplications by streaming data rhythmically through the array. This avoids repeated memory accesses and yields very high throughput for dense linear algebra. Each TPU chip integrates high-bandwidth memory stacks for fast access to weights and activations, on-chip vector and scalar units for non-matrix operations, and dedicated SparseCore engines that accelerate embedding-heavy models such as recommendation systems.
TPUs are deployed in pods that link many chips through a high-bandwidth interconnect. A Trillium pod scales to 256 chips, while Ironwood scales further. Inter-chip communication uses Google's optical circuit-switched network, allowing model and data parallelism across thousands of accelerators with minimal latency.
Software stack
TPUs are programmed primarily through TensorFlow, JAX, and PyTorch via XLA (Accelerated Linear Algebra), a compiler that lowers high-level graphs into TPU-native code. Models trained on GPUs can generally be ported to TPUs by adjusting data pipelines and using bfloat16 mixed-precision training. Google Cloud exposes TPUs through Cloud TPU virtual machines and managed services such as Vertex AI and Google Kubernetes Engine.
Comparison with GPUs
GPUs remain the dominant accelerator for general machine learning research because of mature tooling, broad framework support, and easier programmability. TPUs trade some flexibility for higher throughput and energy efficiency on the specific workloads they target, particularly transformer-based language models and recommendation systems. The chips power Google's internal services and frontier models including Gemini, and they are offered to external customers exclusively through Google Cloud.
Applications
TPUs accelerate training and serving of large language models, image and video generation systems, recommendation engines such as YouTube ranking, and scientific computing workloads in fields including protein structure prediction and weather forecasting. They are widely used by researchers participating in Google's TPU Research Cloud programme.
References
- Jouppi, N. et al. (2017). In-Datacenter Performance Analysis of a Tensor Processing Unit. ISCA 2017.
- Google Cloud. (2024). Trillium TPU is now generally available. cloud.google.com/blog.
- Google. (2025). Introducing Ironwood: the seventh-generation TPU. Google Cloud Next 2025.
- Ministry of Digital Malaysia. (2024). Google announces investment in Malaysian data centre and cloud region. digital.gov.my.