Search Results
16 results for “inference”
Bayesian Inference
Bayesian inference is a statistical method that uses Bayes' theorem to update the probability of a hypothesis as new evidence becomes available, providing a principled framework for reasoning under uncertainty.
Context Window
The maximum number of tokens — including the prompt, prior conversation, retrieved documents, and the model's own output — that a large language model can process in a single forward pass.
Core ML
Core ML is Apple's on-device machine learning framework that enables iOS, macOS, watchOS, and tvOS applications to integrate pre-trained models for tasks including image classification, natural language processing, and sound analysis.
Edge AI
Edge AI is the deployment of artificial intelligence algorithms and inference workloads directly on local devices or edge computing nodes rather than in centralised cloud data centres, enabling low-latency, privacy-preserving, and bandwidth-efficient AI applications.
Feature Store
A centralised data platform for storing, serving, and managing machine learning features so that they can be reused consistently across training and online inference.
Inference (Machine Learning)
Inference is the phase in which a trained machine learning model is used to generate predictions or outputs from new input data, distinct from the earlier training phase.
Model Pruning
A model compression technique that removes redundant or low-importance parameters from a neural network to reduce size, memory footprint, and inference latency while preserving accuracy.
Model Serving
Model serving is the discipline of deploying trained machine learning models behind APIs or runtimes so that production applications can request predictions at scale with predictable latency, throughput, and reliability.
Monte Carlo Methods
A broad class of computational algorithms that use repeated random sampling to obtain numerical results, widely used in machine learning for Bayesian inference, reinforcement learning, and uncertainty estimation.
ONNX (Open Neural Network Exchange)
An open standard format for representing machine learning models that enables interoperability between deep learning frameworks, runtimes, and hardware platforms.
OpenVINO
OpenVINO is an open-source toolkit developed by Intel for optimising and deploying deep learning inference across Intel hardware, including CPUs, GPUs, Neural Processing Units, and FPGAs, with broad support for major AI frameworks and model formats.
PDPA AI Compliance
PDPA AI compliance refers to the application of Malaysia's Personal Data Protection Act 2010 to artificial intelligence systems, governing how personal data may be collected, processed, and used in AI training, inference, and deployment.
Quantisation
Quantisation is a model compression technique that reduces the numerical precision of a neural network's weights and activations from high-bit floating-point formats to lower-bit representations, decreasing memory usage and accelerating inference with minimal accuracy loss.
Retrieval-Augmented Generation
A technique that enhances large language model outputs by retrieving relevant documents from an external knowledge base at inference time, grounding responses in up-to-date and domain-specific information.
Tensor Processing Unit
A tensor processing unit (TPU) is a custom application-specific integrated circuit developed by Google for accelerating machine learning workloads, particularly neural network training and inference.
TensorFlow Lite
TensorFlow Lite is an open-source deep learning framework from Google for running optimised machine learning models on mobile phones, microcontrollers, and other edge devices.