Core ML
Core ML is Apple's on-device machine learning framework that enables iOS, macOS, watchOS, and tvOS applications to integrate pre-trained models for tasks including image classification, natural language processing, and sound analysis.
Core ML is Apple's unified machine learning framework, introduced in 2017 with iOS 11, that allows developers to integrate trained machine learning models directly into applications running on Apple devices. Unlike cloud-based AI inference, Core ML executes models entirely on the user's device — leveraging the CPU, GPU, or Apple Neural Engine (ANE) depending on the model type and device generation — without requiring a network connection. This approach provides three principal advantages: low latency, offline availability, and user privacy since raw data never leaves the device.
Core ML occupies the lowest layer of Apple's on-device machine learning stack. Higher-level frameworks such as Vision (computer vision), Natural Language (NLP), Sound Analysis, and Create ML build upon Core ML and expose task-specific APIs, while Core ML itself handles the underlying model execution and hardware routing.
Architecture
Hardware Acceleration
Core ML automatically selects the optimal compute unit for a given model and operation. On devices with an Apple Neural Engine — introduced with the A11 Bionic chip in 2017 and available in all subsequent Apple silicon — the ANE provides dedicated matrix multiplication throughput measured in trillions of operations per second (TOPS) with substantially lower power consumption than GPU execution. For operations not supported by the ANE, Core ML falls back to the GPU via Metal, and to the CPU for operations not supported by either accelerator.
Apple's M-series chips used in Mac and iPad Pro extend ANE capabilities significantly. The M4 Neural Engine delivers up to 38 TOPS, enabling on-device inference for models that previously required cloud infrastructure.
Model Format
Core ML uses the .mlpackage and legacy .mlmodel file formats. A model package bundles the model weights, compute graph, metadata, and optional multi-function definitions into a single directory structure. The Core ML Tools Python library converts models from PyTorch, TensorFlow, Keras, scikit-learn, XGBoost, LibSVM, and ONNX into Core ML format, allowing practitioners to train in their preferred framework before packaging for deployment.
Multi-Function Models
From Core ML 7 onward, a single model package can expose multiple functions — enabling, for example, a single language model file to contain separate functions for prompt encoding, token generation, and decoding. This design reduces the overhead of managing multiple model files and allows efficient memory management during sequential inference steps, which is particularly relevant for on-device large language model deployments.
Supported Domains
Core ML supports a broad range of model types across several AI domains:
- Computer vision: Image classification, object detection, semantic segmentation, style transfer, depth estimation, action classification from video
- Natural language processing: Text classification, language identification, sentiment analysis, word embeddings, named entity recognition
- Sound analysis: Sound classification, speech recognition
- Tabular data: Regression, classification, and recommendation models from tree-based and linear methods
- Generative models: Diffusion models, language model decoding, image generation (available from Core ML 7 with optimised weight compression for LLMs)
Core ML Tools
The coremltools Python package is the primary interface for converting, optimising, and validating Core ML models. It supports:
- Conversion: from PyTorch (via TorchScript or ExportedProgram), TensorFlow, Keras, ONNX, and classical ML frameworks
- Compression: post-training weight compression including palettisation (weight clustering), linear quantisation to INT4/INT8, and pruning
- Validation: numerical comparison between the original framework output and Core ML output to verify conversion correctness
- Flexible shapes: configuring models to accept variable-length inputs, enabling a single model to handle different image resolutions or sequence lengths
Privacy Model
Core ML's on-device execution model means that sensitive inputs — photographs, voice recordings, health data, financial transactions — are never transmitted to a remote server for inference. This is a significant compliance and trust advantage in regulated industries. Apple enforces this architecture at the OS level: Core ML models cannot initiate network connections during inference, providing a technical guarantee rather than merely a policy commitment.
See Also
References
- Apple Inc. (2025). Core ML Documentation. Apple Developer Documentation. https://developer.apple.com/documentation/coreml
- Apple Inc. (2025). Core ML Tools Documentation. https://apple.github.io/coremltools/
- Apple Inc. (2024). Machine Learning Research at Apple. Apple Machine Learning Research. https://machinelearning.apple.com
- Howard, A. et al. (2019). Searching for MobileNetV3. ICCV 2019. (Reference model architecture commonly deployed via Core ML.)
- Ignatov, A. et al. (2021). AI Benchmark: Running Deep Neural Networks on Android Smartphones. ECCV Workshop.