What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

ONNX (Open Neural Network Exchange)

An open standard format for representing machine learning models that enables interoperability between deep learning frameworks, runtimes, and hardware platforms.

5 min readLast updated May 2026Infrastructure

ONNX, short for Open Neural Network Exchange, is an open-source specification that defines a common, portable representation for machine learning models. By describing a model as a computation graph of standardised operators, ONNX allows a model trained in one framework — such as PyTorch, TensorFlow, scikit-learn, or MATLAB — to be exported and executed in a different runtime or on different hardware without manual rewriting. The project was launched in 2017 by Facebook (now Meta) and Microsoft and is currently governed as a graduated project under the Linux Foundation AI & Data foundation.

Specification and structure

An ONNX model is serialised as a Protocol Buffers (protobuf) file with the extension .onnx. The file contains a computation graph composed of nodes, each of which references an operator defined in an opset. Operators include common neural-network primitives such as convolution, matrix multiplication, layer normalisation, attention, and activation functions. The specification distinguishes between the core opset and ONNX-ML, an extension that adds traditional machine learning operators such as decision tree ensembles and linear classifiers.

Up to intermediate representation (IR) version 6, ONNX described only inference graphs. From IR version 7 onward, the specification also supports gradients and training, although inference remains the dominant use case. The ONNX 1.22 release continues this trajectory, adding new operators for transformer workloads such as grouped query attention and rotary position embedding.

Runtimes and execution providers

ONNX itself does not execute models — it is a format. Execution is provided by separate runtimes that consume .onnx files. The most widely used is ONNX Runtime, maintained by Microsoft, which supports CPU, CUDA, ROCm, DirectML, CoreML, and several specialised accelerators through a plug-in mechanism known as execution providers. Other runtimes include NVIDIA TensorRT, Intel OpenVINO, Qualcomm SNPE, and embedded engines targeted at Arm and RISC-V devices. This separation allows organisations to train in any preferred framework while shipping a single artefact that can be optimised per target.

Typical workflow

A typical ONNX pipeline starts with model authoring in PyTorch or TensorFlow, followed by export through framework-specific tools such as torch.onnx.export or tf2onnx. The exported graph can then be inspected, simplified, and optimised using onnx, onnxoptimizer, and onnxsim. Quantisation tools convert FP32 weights to INT8 or FP16 to shrink model size and reduce latency. The optimised artefact is finally loaded into ONNX Runtime or another engine for production serving.

Adoption

ONNX has become a de facto interchange standard in the MLOps ecosystem. It is used by Hugging Face Optimum for accelerated inference, by Azure Machine Learning and Amazon SageMaker as a deployment target, and by Windows ML to run models inside the operating system. Computer vision and speech models — including YOLO variants, Whisper, and many BERT derivatives — are routinely distributed in ONNX form alongside their native checkpoints.

Malaysian Context — ONNX in local deployment and edge AI

Malaysian enterprises increasingly rely on ONNX to bridge the gap between cloud-trained models and on-premises or edge inference. Banks such as Maybank and CIMB, both subject to data residency expectations from Bank Negara Malaysia, often train fraud-detection and credit-scoring models in cloud environments but deploy them inside their own data centres using ONNX Runtime on CPU servers. This separation allows the data-science teams to use modern Python frameworks while platform engineering keeps a stable C++ runtime in production.

The format is also relevant to the manufacturing corridor in Penang and to AITG SDN BHD partners working on industrial vision. Factories operated by Intel, Western Digital, and other multinationals in Bayan Lepas use ONNX-exported defect-detection models executed through OpenVINO on Intel CPUs and integrated GPUs, allowing line engineers to update models without changing the runtime stack.

For embedded and edge use cases, ONNX-compatible runtimes such as Arm NN, MediaTek NeuroPilot, and Qualcomm SNPE are common on devices designed in Malaysian electronics hubs. The MyDigital Blueprint and the National AI Office Malaysia have both highlighted the importance of model portability in their public materials, framing it as a way to avoid vendor lock-in for government and GLC workloads. HRD Corp-funded training programmes for AI engineers commonly include ONNX export and quantisation as part of the MLOps curriculum.

Universities including Universiti Malaya, Universiti Sains Malaysia, and Universiti Teknologi PETRONAS publish ONNX-formatted research models for medical imaging and remote sensing, supporting reproducibility for researchers across ASEAN who may not have access to identical training stacks.

Limitations

ONNX support varies across frameworks. Some custom or framework-specific operators must be replaced or implemented as user-defined functions before export. Dynamic control flow, very large language models with custom kernels, and rapidly evolving research architectures may not round-trip cleanly. Practitioners often pin a specific opset version per project to maintain stability across training and serving.

References

ONNX Project. (2025). ONNX Intermediate Representation Specification, version 1.22. Linux Foundation AI & Data. onnx.ai.
Microsoft. (2025). ONNX Runtime Documentation. onnxruntime.ai.
Bai, J. et al. (2019). ONNX: Open Neural Network Exchange. GitHub repository, github.com/onnx/onnx.
Splunk. (2025). Open Neural Network Exchange (ONNX) Explained. Splunk Learn.

Tags:onnx interoperability model-format inference mlops

Type	Open model interchange format
Initial release	September 2017
Developed by	Facebook (Meta) and Microsoft; now a Linux Foundation project
Current IR version	ONNX 1.22 (2025)
License	Apache 2.0
Key use	Cross-framework model deployment and optimised inference