What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Tensor Processing Unit

A tensor processing unit (TPU) is a custom application-specific integrated circuit developed by Google for accelerating machine learning workloads, particularly neural network training and inference.

4 min readLast updated May 2026Infrastructure

A tensor processing unit (TPU) is a custom application-specific integrated circuit (ASIC) designed by Google to accelerate the matrix-heavy computations that dominate modern neural networks. Unlike general-purpose central processing units (CPUs) or graphics processing units (GPUs), TPUs are purpose-built for the linear algebra operations underpinning deep learning, including large-scale matrix multiplication, convolution, and embedding lookups.

History and generations

Google began deploying internal TPUs in 2015 to handle workloads such as Google Search ranking, Translate, and Photos. The first generation (TPU v1) was unveiled at Google I/O 2016 and was inference-only. Each successive generation has expanded both training and inference capacity. TPU v2 (2017) introduced training support and bfloat16 arithmetic. TPU v3 (2018) added liquid cooling. TPU v4 (2021) introduced optical circuit switches for pod-level interconnect. TPU v5e and v5p (2023) targeted cost-efficient inference and high-end training respectively. The sixth-generation Trillium (TPU v6e) reached general availability in 2024, delivering approximately 4.7 times the peak compute of v5e and doubling high-bandwidth memory capacity. In April 2025, Google unveiled Ironwood (TPU v7), available in 256-chip and 9,216-chip pod configurations and optimised for inference-heavy frontier model workloads.

Architecture

The core of a TPU is a systolic array, a two-dimensional grid of multiply-accumulate units that perform matrix multiplications by streaming data rhythmically through the array. This avoids repeated memory accesses and yields very high throughput for dense linear algebra. Each TPU chip integrates high-bandwidth memory stacks for fast access to weights and activations, on-chip vector and scalar units for non-matrix operations, and dedicated SparseCore engines that accelerate embedding-heavy models such as recommendation systems.

TPUs are deployed in pods that link many chips through a high-bandwidth interconnect. A Trillium pod scales to 256 chips, while Ironwood scales further. Inter-chip communication uses Google's optical circuit-switched network, allowing model and data parallelism across thousands of accelerators with minimal latency.

Software stack

TPUs are programmed primarily through TensorFlow, JAX, and PyTorch via XLA (Accelerated Linear Algebra), a compiler that lowers high-level graphs into TPU-native code. Models trained on GPUs can generally be ported to TPUs by adjusting data pipelines and using bfloat16 mixed-precision training. Google Cloud exposes TPUs through Cloud TPU virtual machines and managed services such as Vertex AI and Google Kubernetes Engine.

Comparison with GPUs

GPUs remain the dominant accelerator for general machine learning research because of mature tooling, broad framework support, and easier programmability. TPUs trade some flexibility for higher throughput and energy efficiency on the specific workloads they target, particularly transformer-based language models and recommendation systems. The chips power Google's internal services and frontier models including Gemini, and they are offered to external customers exclusively through Google Cloud.

Applications

TPUs accelerate training and serving of large language models, image and video generation systems, recommendation engines such as YouTube ranking, and scientific computing workloads in fields including protein structure prediction and weather forecasting. They are widely used by researchers participating in Google's TPU Research Cloud programme.

Malaysian Context — TPU Access and Adoption

Malaysian researchers and enterprises access TPUs primarily through Google Cloud's asia-southeast1 (Singapore) region, which serves Malaysian workloads with low latency. Google has also announced significant cloud investments in Malaysia, including a multibillion-dollar data centre and cloud region in Selangor announced in 2024. Although the initial scope focuses on standard cloud and AI services, the buildout is expected to expand high-end accelerator availability to local customers over time.

Public universities including Universiti Malaya, Universiti Sains Malaysia, and Universiti Teknologi Malaysia have used the TPU Research Cloud programme for academic projects in natural language processing and computer vision. Local AI startups developing Bahasa Melayu language models, including efforts coordinated through the National AI Office and supported by the Malaysia Digital Economy Corporation (MDEC), have benefited from subsidised TPU access for pretraining experiments.

The Ministry of Digital and the National AI Office have flagged sovereign compute capacity — including potential local hosting of accelerators — as a strategic priority under the forthcoming National AI Action Plan 2026–2030, which is being coordinated by the Ministry of Science, Technology and Innovation (MOSTI). Government-linked entities such as TM ONE and YTL Data Centers are partnering with hyperscalers to expand domestic AI infrastructure, complementing TPU access through Google Cloud.

References

Jouppi, N. et al. (2017). In-Datacenter Performance Analysis of a Tensor Processing Unit. ISCA 2017.
Google Cloud. (2024). Trillium TPU is now generally available. cloud.google.com/blog.
Google. (2025). Introducing Ironwood: the seventh-generation TPU. Google Cloud Next 2025.
Ministry of Digital Malaysia. (2024). Google announces investment in Malaysian data centre and cloud region. digital.gov.my.

Tags:TPU hardware Google accelerator machine learning

Type	AI accelerator ASIC
Developed by	Google
First released	2016 (TPU v1)
Latest	Ironwood (TPU v7), 2025
Key use	Neural network training and inference
Availability	Google Cloud Platform