What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

GPU Cluster

A GPU cluster is a networked group of servers, each containing one or more graphics processing units, purpose-built to accelerate parallel computation workloads such as deep learning training and large-scale AI inference.

6 min readLast updated June 2026Infrastructure

A GPU cluster is an ensemble of interconnected compute nodes, each equipped with one or more graphics processing units (GPUs), designed to execute massively parallel computational tasks. Originally developed for graphics rendering, GPU clusters became the dominant infrastructure for training large deep learning models from approximately 2012 onwards, and today underpin virtually all large-scale AI research and production deployments.

Architecture

A GPU cluster consists of multiple nodes connected over a high-speed fabric. Each node typically contains one to eight GPUs alongside CPUs, high-bandwidth memory (HBM), local storage, and network interface cards. The nodes are linked by specialised interconnects that enable rapid exchange of gradients and activations during distributed training.

NVIDIA's NVLink and NVSwitch technologies provide high-bandwidth, low-latency GPU-to-GPU communication within a single node, while InfiniBand and Ethernet (with RDMA — Remote Direct Memory Access) connect nodes across the cluster. InfiniBand, with bandwidth up to 400 Gb/s per link in the HDR and NDR generations, is the preferred fabric for the largest training clusters.

Storage infrastructure in GPU clusters must match the I/O throughput of training workloads. High-performance parallel file systems such as IBM Spectrum Scale (GPFS) and Lustre are commonly deployed, alongside object storage tiers for dataset archiving.

Role in AI Training

Training large deep learning models requires enormous computational throughput that no single GPU can provide. Distributed training across a GPU cluster is achieved through two primary parallelism strategies.

Data parallelism divides the training dataset across nodes; each node maintains a copy of the model and processes a different minibatch. Gradients are synchronised across nodes after each forward-backward pass, typically using the AllReduce collective communication operation. This approach scales well when the model fits within a single GPU's memory.

Model parallelism (including tensor parallelism and pipeline parallelism) partitions the model itself across GPUs. This is necessary for models too large for a single device's memory — a category that includes most modern large language models. Frameworks such as NVIDIA's Megatron-LM and DeepSpeed implement sophisticated combinations of data, tensor, and pipeline parallelism to train models with hundreds of billions of parameters.

Hardware Generations

GPU hardware has undergone rapid generational advancement. NVIDIA's H100 (Hopper architecture, 2022) and H200 (2024) GPUs deliver up to 7.8 terabytes per second of memory bandwidth and support the FP8 numerical format that accelerates transformer training. AMD's MI300X accelerator, released in 2023, offers 192 GB of unified HBM3 memory per device, making it competitive for memory-bound workloads.

The largest publicly disclosed training clusters as of 2025 exceeded 100,000 GPUs. Meta has described infrastructure equivalent to approximately 600,000 H100-equivalent GPUs; Microsoft and OpenAI operate similarly scaled systems. The GPU-as-a-service market was valued at approximately USD 4.96 billion in 2025 and is projected to reach USD 31.89 billion by 2034, reflecting the sustained demand for compute.

Cloud and On-Premises Deployments

GPU clusters are available through major cloud providers — Amazon Web Services (via EC2 P4 and P5 instances), Google Cloud (via A3 and A3 Mega instances), Microsoft Azure (via NDv5 series), and Oracle Cloud Infrastructure — as well as GPU-focused cloud providers such as Lambda Labs, CoreWeave, and Together AI. Organisations with sustained, high-volume workloads sometimes invest in dedicated on-premises hardware to reduce long-term costs.

The choice between cloud and on-premises deployment depends on workload predictability, capital expenditure tolerance, data residency requirements, and the lead time for hardware procurement. A GPU cluster purpose-built for a specific architecture can be significantly more efficient than general-purpose cloud instances for large, sustained training programmes.

Challenges

GPU clusters present several operational challenges. Fault tolerance is critical: in a 1,000-node training run, the probability of at least one hardware failure over a 24-hour period is non-trivial. Checkpoint-based recovery — periodically saving model weights to persistent storage — is standard practice, and frameworks such as PyTorch Elastic Training support dynamic reconfiguration of cluster membership.

Power and cooling are major engineering constraints. A single H100 GPU has a thermal design power of 700 watts; a cluster of 10,000 such GPUs requires approximately 7 megawatts of sustained power delivery and commensurate cooling capacity. Data centre design for AI workloads therefore differs substantially from conventional enterprise computing, requiring liquid cooling, high-density power distribution, and specialised networking infrastructure.

Network bandwidth between nodes is frequently the bottleneck in distributed training. The AllReduce gradient synchronisation operation at the end of each training step requires all-to-all communication across the cluster; insufficient network bandwidth degrades scaling efficiency. High-performance clusters therefore invest heavily in low-latency, high-bandwidth fabrics.

Malaysian Context — GPU Infrastructure and AI Compute in Malaysia

Malaysia has emerged as a significant AI infrastructure hub in Southeast Asia, attracting major data centre investments driven by the country's competitive land costs, renewable energy potential, and strategic position within the ASEAN digital economy. The government's National AI Roadmap and MyDigital Blueprint both identify compute infrastructure as a foundational pillar of the national AI strategy.

NVIDIA established a regional partnership with YTL Power International to develop Malaysia's first large-scale AI data centre, announced in 2024. YTL's JHD Campus in Johor, powered by Malaysia's existing energy grid and planned solar assets, is designed to host thousands of NVIDIA H100 GPUs for AI training and inference. Microsoft announced a USD 2.2 billion investment in Malaysia's cloud and AI infrastructure in May 2024, which includes GPU-backed instances accessible to Malaysian enterprises through Azure. Google and Amazon Web Services have also announced or expanded data centre presences in Malaysia.

Telekom Malaysia (TM) operates high-density data centres accessible to enterprises seeking GPU compute without capital expenditure. MDEC has facilitated discussions between hyperscalers and Malaysian industry to ensure local companies can access AI compute at competitive rates. Maxis and Celcom have explored GPU-backed edge compute offerings to reduce inference latency for enterprise customers in manufacturing and logistics.

Malaysian universities — including Universiti Malaya, Universiti Teknologi Malaysia, and UTAR — have acquired GPU infrastructure for AI research, partly funded through MOSTI research grants. HRD Corp-registered training providers offer CUDA and distributed training courses, building the talent base needed to operate and utilise GPU clusters effectively for Malaysian AI projects.

References

Shoeybi, M., et al. (2019). Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arXiv:1909.08053.
Clarifai. (2024). What Are GPU Clusters and How They Accelerate AI Workloads. clarifai.com.
Scale Computing. (2024). GPU Cluster Explained: Architecture, Nodes and Use Cases. scalecomputing.com.
Epoch AI. (2025). Data on GPU clusters. epoch.ai.
Cyfuture AI. (2026). Top 10 GPU Cluster Services for AI Training and Machine Learning. cyfuture.ai.

Tags:hardware computing deep-learning infrastructure

Type	High-performance computing infrastructure
Key vendors	NVIDIA, AMD
Primary use	AI model training and inference
Key interconnect	InfiniBand, NVLink, RDMA
Related	TPU, CUDA, Deep Learning, MLOps