Search Results
4 results for “model compression”
Knowledge Distillation
Knowledge distillation is a model compression technique in which a smaller student neural network is trained to replicate the behaviour of a larger, more capable teacher model, enabling deployment of efficient models that approximate teacher-level performance.
Model Compression
Model compression is a set of techniques that reduce the size, memory footprint, and computational cost of machine learning models while preserving predictive accuracy, enabling deployment on resource-constrained hardware.
Model Pruning
A model compression technique that removes redundant or low-importance parameters from a neural network to reduce size, memory footprint, and inference latency while preserving accuracy.
Quantisation
Quantisation is a model compression technique that reduces the numerical precision of a neural network's weights and activations from high-bit floating-point formats to lower-bit representations, decreasing memory usage and accelerating inference with minimal accuracy loss.