AutoML
AutoML (Automated Machine Learning) is the process of automating the selection, composition, and tuning of machine learning algorithms and pipelines, enabling practitioners to build effective models with reduced manual effort.
AutoML (Automated Machine Learning) is the application of automation to the process of designing, building, and optimising machine learning pipelines. It encompasses the automatic selection of the most suitable ML algorithm for a given dataset, the engineering and selection of input features, the tuning of model hyperparameters, and the configuration of the full end-to-end pipeline from raw data to deployed model. The goal of AutoML is to reduce the manual expertise, time, and computational effort required to develop effective ML models, making machine learning more accessible to practitioners without deep specialisation and accelerating the work of experienced researchers.
AutoML addresses what has been called the full model selection problem: given a dataset and a task objective, find the best combination of preprocessing steps, model architecture, and hyperparameter settings to maximise predictive performance. This problem is computationally intractable to solve exhaustively, so AutoML systems use principled search strategies to find high-quality solutions efficiently.
Core Components
Automated Feature Engineering
Feature engineering — the process of transforming raw data into informative representations suitable for ML algorithms — has traditionally been one of the most labour-intensive and knowledge-dependent steps in building effective models. Automated feature engineering tools search over a space of transformations (aggregations, encodings, interaction terms, polynomial features) and select those that improve model performance.
Tools such as Featuretools and the feature engineering components of platforms like H2O Driverless AI and DataRobot automate large portions of this process, drawing on a library of transformation primitives applied to structured data. For unstructured data such as images, text, and audio, transfer learning from pre-trained deep learning models largely replaces hand-crafted feature engineering.
Hyperparameter Optimisation
Most ML algorithms have hyperparameters — settings that are fixed before training and control model complexity, regularisation, and learning dynamics. Examples include the learning rate and depth in gradient boosted trees, the number of layers and neurons in a neural network, and the regularisation strength in a support vector machine. Finding good hyperparameter values manually requires expert intuition and extensive trial and error.
AutoML systems automate this search using several strategies. Grid search exhaustively evaluates a predefined grid of hyperparameter values but scales poorly with the number of hyperparameters. Random search samples randomly from the hyperparameter space and is more efficient than grid search for high-dimensional spaces. Bayesian optimisation builds a probabilistic surrogate model of the objective function and uses it to select the next hyperparameter configuration most likely to improve on the best result so far. Successive halving and Hyperband strategies allocate compute budgets adaptively, terminating poorly performing configurations early to focus resources on promising ones.
Neural Architecture Search
Neural Architecture Search (NAS) applies AutoML principles to the design of deep neural network architectures. Instead of relying on human-designed architectures, NAS algorithms search over a predefined space of architectural choices — number of layers, layer types, skip connections, activation functions — to find architectures that achieve high performance on the target task. Early NAS methods were computationally expensive, requiring thousands of GPU hours, but efficient NAS approaches including DARTS (Differentiable Architecture Search) and weight-sharing methods have substantially reduced this cost.
Pipeline Combination (Combined Algorithm Selection and Hyperparameter Optimisation)
The full AutoML problem — selecting both the algorithm and its hyperparameters jointly — is known as the Combined Algorithm Selection and Hyperparameter Optimisation (CASH) problem. Systems such as Auto-sklearn and TPOT (Tree-based Pipeline Optimisation Tool) address CASH by treating the full pipeline configuration as a structured search problem, using meta-learning (learning from results on prior datasets) to warm-start the search with promising configurations.
Major Platforms
Cloud-based AutoML platforms include Google Cloud AutoML (now integrated into Vertex AI AutoML), Azure Machine Learning's automated ML, and Amazon SageMaker Autopilot. These services provide point-and-click or API-based access to AutoML for tabular data, image classification, and text classification, handling data preprocessing, training, and deployment as managed services.
Open-source AutoML frameworks include Auto-sklearn (built on Scikit-learn), AutoKeras (built on Keras/TensorFlow), Auto-PyTorch, FLAML from Microsoft Research, and TPOT. H2O Driverless AI and DataRobot are commercial platforms with especially strong support for enterprise tabular data use cases.
Limitations and Considerations
AutoML does not eliminate the need for domain expertise. Problem framing — deciding what to predict, how to define success, and how to handle domain-specific data quality issues — remains a fundamentally human task. AutoML also tends to produce models that are optimised for a single metric on a held-out validation set, without necessarily considering interpretability, fairness, or operational constraints.
Computational cost is another practical limitation. Full AutoML searches can require substantial compute, and cloud AutoML services incur costs that scale with search time. Efficient AutoML strategies and early stopping are important for practical deployment.
See Also
References
- Hutter, F., Kotthoff, L., and Vanschoren, J. (Eds.) (2019). Automated Machine Learning: Methods, Systems, Challenges. Springer.
- Feurer, M. et al. (2015). Efficient and Robust Automated Machine Learning. Advances in Neural Information Processing Systems 28 (NeurIPS 2015).
- Zoph, B., and Le, Q.V. (2017). Neural Architecture Search with Reinforcement Learning. Proceedings of ICLR 2017.
- Google Cloud. (2025). Vertex AI AutoML Documentation. cloud.google.com/vertex-ai/docs/automl.
- MDEC. (2023). SME AI Adoption Playbook. Malaysia Digital Economy Corporation, Cyberjaya.