AIWiki
Malaysia

DataOps

DataOps is an engineering methodology that applies agile, DevOps, and lean manufacturing principles to data pipelines, aiming for rapid, reliable, and repeatable delivery of analytics and machine learning data.

4 min readLast updated June 2026Infrastructure

DataOps (a portmanteau of "data" and "operations") is a methodology and set of practices for delivering data and analytics at production quality with the speed and reliability that modern businesses expect. It applies the principles of DevOps, agile software development, and lean manufacturing to the full data lifecycle: ingestion, transformation, quality testing, deployment, and monitoring of pipelines that feed dashboards, machine learning models, and operational systems.

Origin and definition

The term was coined by analyst Lenny Liebmann in 2014 and popularised by the DataKitchen team and others who observed that traditional data warehousing projects suffered from the same coordination and quality problems that DevOps had addressed in software engineering. The DataOps Manifesto, published in 2017, codified 18 principles that emphasise continuous delivery of analytic insights, treating analytics as code, and building quality measurement into pipelines.

Core practices

A mature DataOps practice typically combines several disciplines:

| Practice | Purpose | |---|---| | Pipeline as code | Define ingestion and transformation in version-controlled SQL or Python | | Orchestration | Schedule and monitor DAGs using Airflow, Dagster, Prefect, or Argo | | Data quality testing | Assert schemas, freshness, row counts, and business rules (Great Expectations, dbt tests, Soda) | | Environment promotion | Develop, stage, and produce with isolated data | | Observability | Lineage, anomaly detection on data volumes, freshness SLAs | | Catalogue and contracts | Document datasets, owners, and producer-consumer contracts | | Incident response | Treat broken pipelines as production incidents with postmortems |

The supporting tool stack usually includes a cloud data warehouse or lakehouse (Snowflake, BigQuery, Databricks, Redshift, ClickHouse), a transformation layer (dbt, SQLMesh, Spark), an orchestration engine, a catalogue (Atlan, DataHub, OpenMetadata, Unity Catalog), and an observability platform (Monte Carlo, Bigeye, Acceldata, Sifflet).

Relationship to DevOps and MLOps

DataOps shares with DevOps a commitment to automation, continuous integration, version control, and shared accountability between teams. It differs in the centrality of data as the primary artefact: a deployment can succeed while the underlying data silently degrades, so DataOps adds explicit data testing and observability as first-class concerns. MLOps extends DataOps further to cover model training, evaluation, deployment, and drift monitoring; in practice the disciplines overlap heavily and many organisations run them as a single platform team.

Benefits and adoption challenges

Organisations that adopt DataOps typically report shorter time-to-insight, fewer broken dashboards, more reproducible analytics, and clearer ownership of data assets. Industry surveys regularly cite multi-fold productivity gains for teams that automate testing and deployment compared with those relying on manual processes.

Adoption challenges include legacy ETL systems that resist version control, organisational silos between data engineers and analysts, the cost of refactoring pipelines, and the difficulty of agreeing data contracts between producing and consuming teams.

See Also

References

References

  1. Liebmann, L. (2014). DataOps: Why Big Data Infrastructure Matters. IBM Big Data and Analytics Hub.
  2. DataKitchen. (2017). The DataOps Manifesto.
  3. Bergh, C., Benghiat, G., and Strod, E. (2019). The DataOps Cookbook. DataKitchen.
  4. Atlan. (2025). DataOps: Essential Guide and Principles.
  5. MAMPU. Public Sector Big Data Analytics (DRSA) Strategic Plan.