What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

CUDA

NVIDIA's parallel computing platform and programming model that lets developers use GPUs for general-purpose computation, underpinning most modern deep learning frameworks.

4 min readLast updated May 2026Infrastructure

CUDA (Compute Unified Device Architecture) is NVIDIA's parallel computing platform and programming model that exposes general-purpose computation on graphics processing units. Released publicly in 2007 after several years of internal development, CUDA turned GPUs from fixed-function graphics accelerators into massively parallel processors usable for scientific computing, signal processing, finance, and — most consequentially — deep learning. Almost every mainstream deep learning framework runs on top of the CUDA stack, and the ecosystem of more than three hundred CUDA acceleration libraries and around six million registered developers is widely cited as NVIDIA's most durable competitive moat in the AI hardware market.

Programming model

CUDA extends C and C++ with a small set of keywords that distinguish code that runs on the host CPU from code that runs on the GPU device. Developers write kernels — functions executed in parallel by many threads — and launch them with a grid of thread blocks. Threads within a block share fast on-chip memory and can synchronise, while blocks execute independently and can be scheduled across the streaming multiprocessors of any compatible GPU. Modern CUDA also provides cooperative groups, unified memory that migrates pages between host and device on demand, and asynchronous graph capture for kernel pipelines.

Higher-level Python bindings such as Numba, CuPy, and PyCUDA make CUDA accessible without writing low-level kernels, while frameworks like PyTorch, TensorFlow, and JAX hide CUDA entirely behind familiar tensor APIs.

The CUDA ecosystem

A large portion of CUDA's value comes from optimised libraries that NVIDIA distributes alongside the toolkit. cuDNN provides hand-tuned implementations of convolutions, attention, and recurrent operators that every major deep learning framework calls into. cuBLAS and cuSPARSE accelerate dense and sparse linear algebra. NCCL handles multi-GPU collective communication. TensorRT compiles trained networks into highly optimised inference engines with quantisation and kernel fusion. Triton Inference Server packages those engines for production deployment. RAPIDS extends the model to data science, providing GPU-accelerated equivalents of pandas (cuDF), scikit-learn (cuML), and NetworkX (cuGraph).

Hardware support

CUDA is tied to NVIDIA GPUs and exposes successive generations through a versioned compute capability — recent generations include Pascal, Volta, Turing, Ampere, Hopper, and the Blackwell architecture announced in 2024. At NVIDIA GTC 2025 the company unveiled Rubin CPX, a GPU class purpose-built for massive-context inference workloads. Each generation has added specialised matrix-multiplication units — Tensor Cores — that accelerate the mixed-precision arithmetic at the core of transformer training and inference.

Alternatives

AMD's ROCm and the open SYCL standard target the same general-purpose GPU computing space without requiring NVIDIA hardware, and Intel's oneAPI provides a portable runtime spanning CPUs, GPUs, and accelerators. Apple GPUs use the proprietary Metal API. Translation layers such as ZLUDA and HIP allow some CUDA code to run on AMD hardware, but the breadth of the CUDA library ecosystem means that most production AI workloads in 2025 still target NVIDIA GPUs first.

Malaysian Context — CUDA Infrastructure in Malaysia

Malaysia has become a significant Southeast Asian hub for CUDA-accelerated AI infrastructure. AIMS Data Centre announced in 2026 a 200 MW AI-ready data centre in Cyberjaya scheduled to open in 2027 with an estimated investment of RM4 billion. YTL Power's Kulai, Johor campus partnered with NVIDIA to operate an AI cloud built on Hopper and Blackwell GPUs, and Bridge Data Centres, NTT, and Equinix have all expanded GPU-ready capacity in the Klang Valley and Johor.

Universities including Universiti Malaya (UM), Universiti Sains Malaysia (USM), and Universiti Teknologi Malaysia (UTM) operate NVIDIA-based research clusters used for materials science, computational biology, and large language model fine-tuning. MIMOS Berhad and the National AI Office (NAIO) under the Ministry of Digital coordinate national-level access to GPU resources for public-sector and research projects.

The Malaysia Digital Economy Corporation (MDEC) lists CUDA programming and accelerated computing among its NVIDIA Deep Learning Institute partnerships, and HRD Corp claims subsidise CUDA training delivered through Cyberjaya providers. The Penang Skills Development Centre (PSDC) and Selangor Information Technology and Digital Economy Corporation (SIDEC) include CUDA-based AI engineering in their reskilling tracks for manufacturing and electronics professionals.

For partner solutions such as AITG Sdn Bhd's Teragrid Agent and Teragrid Ai Platform that run inference on AWS Bedrock or NVIDIA GPUs, CUDA compatibility — and the matching cuDNN, TensorRT, and NCCL versions — is a routine deployment consideration.

References

Nickolls, J. et al. (2008). Scalable Parallel Programming with CUDA. ACM Queue.
NVIDIA Corporation (2025). CUDA Toolkit Documentation. docs.nvidia.com/cuda.
Computer Weekly (2025). CUDA at 20: From billion-dollar gamble to agentic AI. computerweekly.com.
NVIDIA GTC (2025). CUDA: New Features and Beyond, S72383. nvidia.com/on-demand.

Tags:gpu nvidia parallel computing deep learning

Full name	Compute Unified Device Architecture
Developed by	NVIDIA Corporation
First released	23 June 2007
Latest stable	CUDA Toolkit 12.x (2025)
Language bindings	C, C++, Fortran, Python (via Numba, CuPy)
Related	cuDNN, cuBLAS, TensorRT, ROCm (AMD alternative)