Ensemble Learning
Ensemble learning is a machine learning technique that combines the predictions of multiple models to achieve higher accuracy and robustness than any single constituent model could attain on its own.
Overview
Ensemble learning is a family of techniques in machine learning that combine the outputs of several individual models, often called base learners or weak learners, to produce a single, stronger prediction. The guiding principle is that a group of diverse models, when aggregated appropriately, tends to make fewer errors than any one model acting alone, because the independent mistakes of individual learners partially cancel out.
The idea draws on the statistical observation that averaging reduces variance. When base learners are accurate and sufficiently uncorrelated, combining them improves generalisation to unseen data. Ensemble methods are among the most reliable performers in applied machine learning and consistently feature in winning solutions on competitive platforms such as Kaggle.
How ensembles work
Ensembles differ mainly in how base learners are trained and how their outputs are combined. Three broad families dominate practice: bagging, boosting, and stacking.
Bagging
Bootstrap aggregating, or bagging, trains each base learner on a different random sample drawn with replacement from the training data. Predictions are then averaged for regression or decided by majority vote for classification. Because each model sees a slightly different dataset, the ensemble reduces variance without substantially increasing bias. The random forest algorithm extends bagging by also randomising the subset of features considered at each split of a decision tree, producing a robust general-purpose classifier.
Boosting
Boosting trains base learners sequentially, with each new model focusing on the examples its predecessors handled poorly. Misclassified instances receive greater weight, so the ensemble gradually concentrates effort on the hardest cases. AdaBoost was an early influential algorithm, while gradient boosting frames the process as gradient descent on a loss function. Modern implementations such as XGBoost, LightGBM and CatBoost are widely used for structured and tabular data.
Stacking and voting
Stacking, or stacked generalisation, trains a higher-level model — the meta-learner — to combine the predictions of several diverse base models. Simpler aggregation schemes include hard voting, which takes the majority class, and soft voting, which averages predicted probabilities. Stacking can blend fundamentally different model types, such as decision trees, support vector machines and neural networks.
Bias-variance trade-off
Ensembles are best understood through the bias-variance decomposition of prediction error. Bagging primarily reduces variance, making it effective for high-variance, low-bias learners such as deep decision trees. Boosting reduces both bias and variance but can overfit noisy data if too many rounds are run. Choosing the right ensemble strategy therefore depends on the characteristics of the base learner and the dataset.
Applications
Ensemble methods power credit scoring, fraud detection, churn prediction, demand forecasting and ranking systems. They are favoured wherever tabular data dominates and the influence of individual features matters, since techniques such as feature importance and SHAP values can be applied to tree ensembles. The main trade-offs are higher computational cost and reduced transparency compared with a single model.
References
- Breiman, L. (2001). Random Forests. Machine Learning, 45(1).
- Freund, Y. and Schapire, R. (1997). A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences.
- Chen, T. and Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. KDD.
- Bank Negara Malaysia. (2024). Guidance on the responsible use of artificial intelligence in financial services.