AIWiki
Malaysia

Hyperparameter Tuning

The process of selecting optimal configuration values for a machine learning model's external parameters using methods such as grid search, random search, and Bayesian optimisation.

6 min readLast updated May 2026Infrastructure

Hyperparameter tuning, also called hyperparameter optimisation (HPO), is the process of choosing the values of a machine learning model's external configuration — its hyperparameters — to maximise performance on a held-out validation set. Hyperparameters differ from model parameters in that they are set before training rather than learned from data. Examples include the learning rate of a neural network, the depth and minimum-leaf-size of a decision tree, the number of attention heads in a transformer, and the regularisation strength of a logistic regression. Good hyperparameter choices often matter as much as the choice of model family itself, and HPO is now a routine step in every disciplined machine learning workflow.

Why tuning matters

A poorly tuned model can underperform a much simpler well-tuned model by a large margin. Conversely, the same architecture and dataset can produce widely different results depending on small changes in optimiser learning rate, batch size, or weight decay. Tuning provides a systematic, reproducible procedure for finding good configurations and quantifying how sensitive a model is to its hyperparameters, which is useful both for performance and for robustness assessment.

Main methods

Grid search exhaustively evaluates every combination of values from a predefined finite set per hyperparameter. It is simple, embarrassingly parallel, and easy to reason about. Its cost grows multiplicatively with the number of hyperparameters, which makes it impractical beyond two or three dimensions. Grid search also wastes resources when only a few hyperparameters meaningfully affect performance, because it spends equal effort on irrelevant ones.

Random search samples configurations from specified distributions (uniform, log-uniform, categorical) rather than from a fixed grid. Bergstra and Bengio showed in 2012 that random search reaches comparable or better performance than grid search with far fewer evaluations when only a subset of hyperparameters truly matter — which is typically the case in deep learning. It is the recommended baseline for any tuning study.

Bayesian optimisation

Bayesian optimisation builds a probabilistic surrogate model — usually a Gaussian process or a tree-structured Parzen estimator (TPE) — of the objective function based on past observations. An acquisition function such as expected improvement balances exploration of uncertain regions with exploitation of promising ones, and selects the next configuration to evaluate. Bayesian methods typically converge in far fewer iterations than grid or random search, at the cost of more complex implementation and reduced parallelism. They are the default in libraries such as Optuna, Hyperopt, and BoTorch.

Hyperband and successive halving

Hyperband, introduced by Li and colleagues in 2017, exploits the fact that many bad configurations can be ruled out after only a few epochs. It allocates a small budget to many configurations, keeps the best fraction, and repeatedly doubles the budget while halving the population. BOHB combines Bayesian optimisation with Hyperband to inherit the strengths of both.

Evolutionary and population-based methods

Evolutionary algorithms maintain a population of configurations and apply mutation and selection. Population-Based Training (PBT), used at DeepMind for reinforcement learning, periodically copies and perturbs the hyperparameters of the best-performing members of the population in parallel. These methods are well suited to long training runs where hyperparameters might benefit from being changed during training rather than fixed up front.

Tooling

Open-source frameworks have matured significantly. Optuna and Ray Tune are widely used Python libraries that support distributed execution, pruning, and integration with common ML frameworks. Hyperopt remains popular for TPE-based search. KerasTuner is convenient for Keras users. Weights & Biases Sweeps and MLflow integrate tuning runs with broader experiment tracking. Cloud providers offer managed services including Vertex AI Vizier, Amazon SageMaker Automatic Model Tuning, and Azure ML's hyperparameter tuning.

Practical considerations

Tuning is bounded by compute budget. Practical workflows usually start with a small random search to understand sensitivity, then move to Bayesian methods or Hyperband within the promising region of the space. Search spaces should be specified on the appropriate scale — log-uniform for learning rates and regularisation strengths, integer for layer counts. Early stopping is essential: training every configuration to convergence is rarely affordable. Cross-validation should be used cautiously because it multiplies cost. For very large models such as foundation LLMs, full HPO is impractical and practitioners rely on well-documented community defaults, ablation on smaller proxies, and learning-rate range tests.

Pitfalls

Common pitfalls include tuning on the test set rather than a separate validation set, ignoring random seed variance, choosing search spaces that are too narrow or too wide, and failing to report the search budget alongside the reported results. Reporting only the best run obscures how sensitive a method is to hyperparameter choice; reporting the distribution across trials gives a much more honest picture of robustness.

References

  1. Bergstra, J., & Bengio, Y. (2012). Random Search for Hyper-Parameter Optimization. JMLR.
  2. Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian Optimization of Machine Learning Algorithms. NeurIPS.
  3. Li, L. et al. (2017). Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization. JMLR.
  4. Akiba, T. et al. (2019). Optuna: A Next-Generation Hyperparameter Optimization Framework. KDD.
  5. Bank Negara Malaysia. (2023). Risk Management in Technology (RMiT) Policy Document. bnm.gov.my.