AIWiki
Malaysia

Search Results

6 results for latency

Infrastructure

Edge AI

Edge AI is the deployment of artificial intelligence algorithms and inference workloads directly on local devices or edge computing nodes rather than in centralised cloud data centres, enabling low-latency, privacy-preserving, and bandwidth-efficient AI applications.

7 min readUpdated May 2026
Infrastructure

Inference (Machine Learning)

Inference is the phase in which a trained machine learning model is used to generate predictions or outputs from new input data, distinct from the earlier training phase.

5 min readUpdated May 2026
Infrastructure

Model Pruning

A model compression technique that removes redundant or low-importance parameters from a neural network to reduce size, memory footprint, and inference latency while preserving accuracy.

6 min readUpdated June 2026
Infrastructure

Model Serving

Model serving is the discipline of deploying trained machine learning models behind APIs or runtimes so that production applications can request predictions at scale with predictable latency, throughput, and reliability.

5 min readUpdated May 2026
Infrastructure

Neural Architecture Search

Neural architecture search is the automated design of neural network architectures using search algorithms, reinforcement learning, or gradient-based methods to discover models that meet target accuracy, latency, and size constraints.

5 min readUpdated May 2026
Companies & Tools

Pinecone

Pinecone is a managed, cloud-native vector database designed for storing high-dimensional embeddings and serving low-latency similarity search for retrieval-augmented AI applications.

5 min readUpdated May 2026