TinyML
TinyML is a field of machine learning focused on running machine learning models on microcontrollers and other resource-constrained edge devices that typically operate with milliwatts of power and kilobytes of memory.
TinyML refers to the practice of designing, optimising, and deploying machine learning models on microcontrollers and other extremely resource-constrained embedded devices. Where mainstream machine learning targets servers with gigabytes of memory and tens to hundreds of watts of power consumption, TinyML targets devices that operate on coin-cell batteries, draw less than a milliwatt during inference, and execute on systems-on-chip with under a megabyte of flash memory and tens to hundreds of kilobytes of RAM. The term was coined and popularised by Pete Warden and others at Google and Harvard from around 2018, building on earlier work in always-on keyword spotting on smartphones.
Hardware Platforms
The dominant target hardware family is the ARM Cortex-M series of 32-bit microcontroller cores — Cortex-M0+, M4, M7, M33, and M55 — manufactured by companies such as STMicroelectronics, NXP, Nordic Semiconductor, Renesas, Silicon Labs, and Ambiq. The Cortex-M55 and the ARM Ethos-U55 microNPU, introduced in 2020, marked a step change by providing dedicated neural network accelerators while remaining within microcontroller power and silicon budgets.
Other relevant platforms include the Espressif ESP32 family, which combines Wi-Fi and Bluetooth radios with dual-core processors capable of running TinyML workloads, the Raspberry Pi RP2040 used in low-cost prototyping, and open-source RISC-V cores such as those from SiFive and Andes. Specialised accelerator chips from Syntiant, Greenwaves, GAP9, and Maxim Integrated target specific TinyML applications such as always-on audio detection.
Software and Frameworks
TensorFlow Lite for Microcontrollers (TensorFlow Lite Micro), released in 2019, is the most widely used inference runtime for TinyML. It is written in portable C++ with no operating system or dynamic memory allocation dependencies, making it suitable for bare-metal microcontroller environments. Edge Impulse provides an end-to-end SaaS platform for TinyML, covering data collection, model training, optimisation, and deployment to a wide range of supported devices. ST's X-CUBE-AI converts trained models into optimised C code targeting STM32 microcontrollers, and Arduino's machine learning ecosystem provides accessible entry points for hobbyists and educators.
Common model optimisation techniques include post-training quantisation to 8-bit or 4-bit integer arithmetic, weight pruning to remove insignificant connections, knowledge distillation to transfer capability from a large teacher model to a small student, and operator fusion to reduce memory accesses. Together these techniques can reduce a model's footprint by an order of magnitude while preserving most of its accuracy.
Applications
Keyword Spotting and Voice Activation
Always-on keyword detection — recognising phrases such as "Hey Google" or "Alexa" — is the canonical TinyML workload. Compact convolutional or recurrent models running on dedicated low-power cores listen continuously while the main application processor remains in deep sleep, waking the device only when a relevant phrase is detected.
Anomaly Detection in Industrial Equipment
Vibration, acoustic, and current sensors attached to industrial motors, pumps, and HVAC equipment generate continuous data streams. TinyML models running on board the sensor detect anomalous patterns indicating bearing wear, imbalance, or impending failure, enabling predictive maintenance without streaming raw data to the cloud.
Gesture and Activity Recognition
Wearables, smart remotes, and toys use TinyML to recognise gestures and physical activities from accelerometer and gyroscope data, with all inference performed locally on a battery-powered device.
Visual Wake Words
Small image classification models distinguish whether a person is present in a camera frame, enabling battery-powered cameras and doorbells to remain in low-power mode and only transmit when someone is detected.
Environmental Monitoring
Soil moisture sensors, air-quality monitors, and wildlife camera traps deployed in remote locations use TinyML to filter, classify, and summarise readings on-device, reducing the bandwidth and power required to transmit data to a base station.
Benefits and Trade-offs
TinyML offers several advantages over cloud-based machine learning: low latency, since inference happens locally; data privacy, since raw sensor data need not leave the device; bandwidth efficiency, since only inferences or summaries are transmitted; and battery life measured in months or years on a small primary cell. The trade-offs are reduced model capacity, the need for careful optimisation, and increased deployment complexity for over-the-air updates and lifecycle management.
References
- Warden, P., and Situnayake, D. (2019). TinyML: Machine Learning with TensorFlow Lite on Arduino and Ultra-Low-Power Microcontrollers. O'Reilly Media.
- David, R. et al. (2021). TensorFlow Lite Micro: Embedded Machine Learning on TinyML Systems. Proceedings of Machine Learning and Systems, 3, 800–811.
- Banbury, C. R. et al. (2021). Benchmarking TinyML Systems: Challenges and Direction. arXiv:2003.04821.
- Malaysia Digital Economy Corporation. (2023). Industry4WRD National Policy on Industry 4.0. MITI Malaysia.
- ARM Limited. (2024). Cortex-M Processor Family Technical Reference. Cambridge: ARM.