AIWiki
Malaysia

ONNX (Open Neural Network Exchange)

An open standard format for representing machine learning models that enables interoperability between deep learning frameworks, runtimes, and hardware platforms.

5 min readLast updated May 2026Infrastructure

ONNX, short for Open Neural Network Exchange, is an open-source specification that defines a common, portable representation for machine learning models. By describing a model as a computation graph of standardised operators, ONNX allows a model trained in one framework — such as PyTorch, TensorFlow, scikit-learn, or MATLAB — to be exported and executed in a different runtime or on different hardware without manual rewriting. The project was launched in 2017 by Facebook (now Meta) and Microsoft and is currently governed as a graduated project under the Linux Foundation AI & Data foundation.

Specification and structure

An ONNX model is serialised as a Protocol Buffers (protobuf) file with the extension .onnx. The file contains a computation graph composed of nodes, each of which references an operator defined in an opset. Operators include common neural-network primitives such as convolution, matrix multiplication, layer normalisation, attention, and activation functions. The specification distinguishes between the core opset and ONNX-ML, an extension that adds traditional machine learning operators such as decision tree ensembles and linear classifiers.

Up to intermediate representation (IR) version 6, ONNX described only inference graphs. From IR version 7 onward, the specification also supports gradients and training, although inference remains the dominant use case. The ONNX 1.22 release continues this trajectory, adding new operators for transformer workloads such as grouped query attention and rotary position embedding.

Runtimes and execution providers

ONNX itself does not execute models — it is a format. Execution is provided by separate runtimes that consume .onnx files. The most widely used is ONNX Runtime, maintained by Microsoft, which supports CPU, CUDA, ROCm, DirectML, CoreML, and several specialised accelerators through a plug-in mechanism known as execution providers. Other runtimes include NVIDIA TensorRT, Intel OpenVINO, Qualcomm SNPE, and embedded engines targeted at Arm and RISC-V devices. This separation allows organisations to train in any preferred framework while shipping a single artefact that can be optimised per target.

Typical workflow

A typical ONNX pipeline starts with model authoring in PyTorch or TensorFlow, followed by export through framework-specific tools such as torch.onnx.export or tf2onnx. The exported graph can then be inspected, simplified, and optimised using onnx, onnxoptimizer, and onnxsim. Quantisation tools convert FP32 weights to INT8 or FP16 to shrink model size and reduce latency. The optimised artefact is finally loaded into ONNX Runtime or another engine for production serving.

Adoption

ONNX has become a de facto interchange standard in the MLOps ecosystem. It is used by Hugging Face Optimum for accelerated inference, by Azure Machine Learning and Amazon SageMaker as a deployment target, and by Windows ML to run models inside the operating system. Computer vision and speech models — including YOLO variants, Whisper, and many BERT derivatives — are routinely distributed in ONNX form alongside their native checkpoints.

Limitations

ONNX support varies across frameworks. Some custom or framework-specific operators must be replaced or implemented as user-defined functions before export. Dynamic control flow, very large language models with custom kernels, and rapidly evolving research architectures may not round-trip cleanly. Practitioners often pin a specific opset version per project to maintain stability across training and serving.

References

  1. ONNX Project. (2025). ONNX Intermediate Representation Specification, version 1.22. Linux Foundation AI & Data. onnx.ai.
  2. Microsoft. (2025). ONNX Runtime Documentation. onnxruntime.ai.
  3. Bai, J. et al. (2019). ONNX: Open Neural Network Exchange. GitHub repository, github.com/onnx/onnx.
  4. Splunk. (2025). Open Neural Network Exchange (ONNX) Explained. Splunk Learn.