AIWiki
Malaysia

Search Results

16 results for inference

Foundations

Bayesian Inference

Bayesian inference is a statistical method that uses Bayes' theorem to update the probability of a hypothesis as new evidence becomes available, providing a principled framework for reasoning under uncertainty.

6 min readUpdated May 2026
Foundations

Context Window

The maximum number of tokens — including the prompt, prior conversation, retrieved documents, and the model's own output — that a large language model can process in a single forward pass.

5 min readUpdated May 2026
Infrastructure

Core ML

Core ML is Apple's on-device machine learning framework that enables iOS, macOS, watchOS, and tvOS applications to integrate pre-trained models for tasks including image classification, natural language processing, and sound analysis.

5 min readUpdated June 2026
Infrastructure

Edge AI

Edge AI is the deployment of artificial intelligence algorithms and inference workloads directly on local devices or edge computing nodes rather than in centralised cloud data centres, enabling low-latency, privacy-preserving, and bandwidth-efficient AI applications.

7 min readUpdated May 2026
Infrastructure

Feature Store

A centralised data platform for storing, serving, and managing machine learning features so that they can be reused consistently across training and online inference.

5 min readUpdated May 2026
Infrastructure

Inference (Machine Learning)

Inference is the phase in which a trained machine learning model is used to generate predictions or outputs from new input data, distinct from the earlier training phase.

5 min readUpdated May 2026
Infrastructure

Model Pruning

A model compression technique that removes redundant or low-importance parameters from a neural network to reduce size, memory footprint, and inference latency while preserving accuracy.

6 min readUpdated June 2026
Infrastructure

Model Serving

Model serving is the discipline of deploying trained machine learning models behind APIs or runtimes so that production applications can request predictions at scale with predictable latency, throughput, and reliability.

5 min readUpdated May 2026
Foundations

Monte Carlo Methods

A broad class of computational algorithms that use repeated random sampling to obtain numerical results, widely used in machine learning for Bayesian inference, reinforcement learning, and uncertainty estimation.

5 min readUpdated May 2026
Infrastructure

ONNX (Open Neural Network Exchange)

An open standard format for representing machine learning models that enables interoperability between deep learning frameworks, runtimes, and hardware platforms.

5 min readUpdated May 2026
Infrastructure

OpenVINO

OpenVINO is an open-source toolkit developed by Intel for optimising and deploying deep learning inference across Intel hardware, including CPUs, GPUs, Neural Processing Units, and FPGAs, with broad support for major AI frameworks and model formats.

6 min readUpdated June 2026
Malaysian Context

PDPA AI Compliance

PDPA AI compliance refers to the application of Malaysia's Personal Data Protection Act 2010 to artificial intelligence systems, governing how personal data may be collected, processed, and used in AI training, inference, and deployment.

6 min readUpdated May 2026
Infrastructure

Quantisation

Quantisation is a model compression technique that reduces the numerical precision of a neural network's weights and activations from high-bit floating-point formats to lower-bit representations, decreasing memory usage and accelerating inference with minimal accuracy loss.

7 min readUpdated May 2026
Applications

Retrieval-Augmented Generation

A technique that enhances large language model outputs by retrieving relevant documents from an external knowledge base at inference time, grounding responses in up-to-date and domain-specific information.

6 min readUpdated May 2026
Infrastructure

Tensor Processing Unit

A tensor processing unit (TPU) is a custom application-specific integrated circuit developed by Google for accelerating machine learning workloads, particularly neural network training and inference.

4 min readUpdated May 2026
Infrastructure

TensorFlow Lite

TensorFlow Lite is an open-source deep learning framework from Google for running optimised machine learning models on mobile phones, microcontrollers, and other edge devices.

5 min readUpdated June 2026