AIWiki
Malaysia

Search Results

8 results for serving

Infrastructure

Edge AI

Edge AI is the deployment of artificial intelligence algorithms and inference workloads directly on local devices or edge computing nodes rather than in centralised cloud data centres, enabling low-latency, privacy-preserving, and bandwidth-efficient AI applications.

7 min readUpdated May 2026
Infrastructure

Feature Store

A centralised data platform for storing, serving, and managing machine learning features so that they can be reused consistently across training and online inference.

5 min readUpdated May 2026
Foundations

Federated Learning

Federated learning is a machine learning paradigm in which a model is trained across multiple decentralised devices or servers holding local data, without exchanging the raw data itself, preserving privacy while enabling collaborative model improvement.

6 min readUpdated May 2026
Infrastructure

Inference (Machine Learning)

Inference is the phase in which a trained machine learning model is used to generate predictions or outputs from new input data, distinct from the earlier training phase.

5 min readUpdated May 2026
Infrastructure

Model Compression

Model compression is a set of techniques that reduce the size, memory footprint, and computational cost of machine learning models while preserving predictive accuracy, enabling deployment on resource-constrained hardware.

6 min readUpdated June 2026
Infrastructure

Model Pruning

A model compression technique that removes redundant or low-importance parameters from a neural network to reduce size, memory footprint, and inference latency while preserving accuracy.

6 min readUpdated June 2026
Infrastructure

Model Serving

Model serving is the discipline of deploying trained machine learning models behind APIs or runtimes so that production applications can request predictions at scale with predictable latency, throughput, and reliability.

5 min readUpdated May 2026
Companies & Tools

Pinecone

Pinecone is a managed, cloud-native vector database designed for storing high-dimensional embeddings and serving low-latency similarity search for retrieval-augmented AI applications.

5 min readUpdated May 2026