Feature Store
A centralised data platform for storing, serving, and managing machine learning features so that they can be reused consistently across training and online inference.
Feature stores are data systems purpose-built for machine learning that store, version, and serve features — the curated inputs to a model — through a single source of truth shared by data scientists, ML engineers, and online inference services. The pattern was popularised by Uber's Michelangelo platform in 2017 and has since become a standard component of mature MLOps stacks for fraud detection, recommendation, dynamic pricing, and real-time personalisation.
Why feature stores exist
Production machine learning suffers from two recurring problems. The first is training-serving skew: features computed in batch jobs for training drift away from how the same features are computed in a low-latency online service, silently degrading model quality. The second is duplication of effort: every team rebuilds the same join, aggregation, or windowed average against the warehouse, producing slightly different versions of the same concept such as user lifetime spend. A feature store solves both by defining each feature once, materialising it to an offline store for training and an online store for inference, and guaranteeing that the two stay consistent.
Architecture
A typical feature store has four layers. Feature definitions are declarative specifications — usually in Python or SQL — describing how a feature is computed from upstream sources. Transformation engines run those definitions in batch using Spark, dbt, or Snowflake and in streaming mode using Kafka, Flink, or Kinesis. The offline store keeps historical feature values, typically in a columnar warehouse such as BigQuery, Snowflake, Databricks Delta, or Parquet on object storage, and supports point-in-time joins to assemble training sets without label leakage. The online store, usually a low-latency key-value system such as Redis, DynamoDB, Aerospike, or Cassandra, serves the latest feature values to inference services with single-digit millisecond latency.
Major implementations
Feast is the leading open-source feature store, originally built at Gojek and now maintained as a community project. Its modular architecture lets teams plug in any warehouse, online store, or compute engine, making it the preferred choice for organisations that want to avoid vendor lock-in. Tecton, founded by members of the Uber Michelangelo team, is a fully managed enterprise platform with strong support for streaming features defined against Kafka and Kinesis. Hopsworks offers an open-core platform with a polished UI, native lineage tracking, and integrations with Spark and Flink. The major cloud providers and data platforms — AWS SageMaker, Databricks, Google Vertex AI, and Snowflake — have added their own feature store offerings that integrate tightly with the rest of their stacks.
| Platform | Licence | Streaming support | Best fit | | --- | --- | --- | --- | | Feast | Open source (Apache 2.0) | Via Spark or external | Teams that own their stack | | Tecton | Commercial / managed | First-class | Real-time use cases at enterprise scale | | Hopsworks | Open core | Native (Flink) | On-prem and regulated industries | | Databricks Feature Store | Bundled commercial | Via Structured Streaming | Existing Databricks users |
When feature stores are worth the cost
Feature stores add operational complexity and are not necessary for every team. They become valuable when an organisation has multiple ML models in production, requires sub-second online predictions, needs to reuse features across teams, or must demonstrate point-in-time correctness for regulators. Teams running a single batch model on weekly retraining cadences usually do not need a dedicated feature store.
References
- Hermann, J., and Del Balso, M. (2017). Meet Michelangelo: Uber's Machine Learning Platform. Uber Engineering Blog.
- Feast Authors (2025). Feast: An Open Source Feature Store for Machine Learning. feast.dev.
- Tecton (2025). The Definitive Guide to Feature Stores. tecton.ai.
- Bank Negara Malaysia (2023). Risk Management in Technology (RMiT). bnm.gov.my.