Search Results
8 results for “serving”
Edge AI
Edge AI is the deployment of artificial intelligence algorithms and inference workloads directly on local devices or edge computing nodes rather than in centralised cloud data centres, enabling low-latency, privacy-preserving, and bandwidth-efficient AI applications.
Feature Store
A centralised data platform for storing, serving, and managing machine learning features so that they can be reused consistently across training and online inference.
Federated Learning
Federated learning is a machine learning paradigm in which a model is trained across multiple decentralised devices or servers holding local data, without exchanging the raw data itself, preserving privacy while enabling collaborative model improvement.
Inference (Machine Learning)
Inference is the phase in which a trained machine learning model is used to generate predictions or outputs from new input data, distinct from the earlier training phase.
Model Compression
Model compression is a set of techniques that reduce the size, memory footprint, and computational cost of machine learning models while preserving predictive accuracy, enabling deployment on resource-constrained hardware.
Model Pruning
A model compression technique that removes redundant or low-importance parameters from a neural network to reduce size, memory footprint, and inference latency while preserving accuracy.
Model Serving
Model serving is the discipline of deploying trained machine learning models behind APIs or runtimes so that production applications can request predictions at scale with predictable latency, throughput, and reliability.
Pinecone
Pinecone is a managed, cloud-native vector database designed for storing high-dimensional embeddings and serving low-latency similarity search for retrieval-augmented AI applications.