What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Ensemble Learning

Ensemble learning is a machine learning technique that combines the predictions of multiple models to achieve higher accuracy and robustness than any single constituent model could attain on its own.

4 min readLast updated June 2026Foundations

Overview

Ensemble learning is a family of techniques in machine learning that combine the outputs of several individual models, often called base learners or weak learners, to produce a single, stronger prediction. The guiding principle is that a group of diverse models, when aggregated appropriately, tends to make fewer errors than any one model acting alone, because the independent mistakes of individual learners partially cancel out.

The idea draws on the statistical observation that averaging reduces variance. When base learners are accurate and sufficiently uncorrelated, combining them improves generalisation to unseen data. Ensemble methods are among the most reliable performers in applied machine learning and consistently feature in winning solutions on competitive platforms such as Kaggle.

How ensembles work

Ensembles differ mainly in how base learners are trained and how their outputs are combined. Three broad families dominate practice: bagging, boosting, and stacking.

Bagging

Bootstrap aggregating, or bagging, trains each base learner on a different random sample drawn with replacement from the training data. Predictions are then averaged for regression or decided by majority vote for classification. Because each model sees a slightly different dataset, the ensemble reduces variance without substantially increasing bias. The random forest algorithm extends bagging by also randomising the subset of features considered at each split of a decision tree, producing a robust general-purpose classifier.

Boosting

Boosting trains base learners sequentially, with each new model focusing on the examples its predecessors handled poorly. Misclassified instances receive greater weight, so the ensemble gradually concentrates effort on the hardest cases. AdaBoost was an early influential algorithm, while gradient boosting frames the process as gradient descent on a loss function. Modern implementations such as XGBoost, LightGBM and CatBoost are widely used for structured and tabular data.

Stacking and voting

Stacking, or stacked generalisation, trains a higher-level model — the meta-learner — to combine the predictions of several diverse base models. Simpler aggregation schemes include hard voting, which takes the majority class, and soft voting, which averages predicted probabilities. Stacking can blend fundamentally different model types, such as decision trees, support vector machines and neural networks.

Bias-variance trade-off

Ensembles are best understood through the bias-variance decomposition of prediction error. Bagging primarily reduces variance, making it effective for high-variance, low-bias learners such as deep decision trees. Boosting reduces both bias and variance but can overfit noisy data if too many rounds are run. Choosing the right ensemble strategy therefore depends on the characteristics of the base learner and the dataset.

Applications

Ensemble methods power credit scoring, fraud detection, churn prediction, demand forecasting and ranking systems. They are favoured wherever tabular data dominates and the influence of individual features matters, since techniques such as feature importance and SHAP values can be applied to tree ensembles. The main trade-offs are higher computational cost and reduced transparency compared with a single model.

Malaysian Context — Ensemble Methods in Local Industry

Malaysian banks including Maybank, CIMB and Public Bank rely heavily on ensemble models such as gradient-boosted trees for credit risk scoring, fraud detection and customer churn prediction, areas where tabular ensembles remain the practical state of the art. Such deployments fall within Bank Negara Malaysia's expectations on model risk and the responsible use of analytics and artificial intelligence in financial services.

E-commerce and ride-hailing platforms with significant Malaysian operations, including Shopee, Lazada and Grab, use ensemble models for recommendation ranking, dynamic pricing and abuse detection across Southeast Asian markets. Telecommunications providers such as Maxis, CelcomDigi and TM apply ensembles to predict network faults and customer attrition.

Adoption is supported by Malaysia's analytics talent pipeline. HRD Corp subsidises data science and machine learning training for employers, while MDEC promotes data-driven adoption among small and medium enterprises. Universities including Universiti Malaya and Universiti Sains Malaysia teach ensemble methods within their data science curricula, and the Personal Data Protection Act 2010 governs the customer data on which many of these models are trained.

References

Breiman, L. (2001). Random Forests. Machine Learning, 45(1).
Freund, Y. and Schapire, R. (1997). A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences.
Chen, T. and Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. KDD.
Bank Negara Malaysia. (2024). Guidance on the responsible use of artificial intelligence in financial services.

Tags:ensemble methods bagging boosting random forest

Type	Machine learning technique
Main families	Bagging, Boosting, Stacking
Notable algorithms	Random Forest, XGBoost, AdaBoost
Key benefit	Reduced variance and bias
Related	Random forest, Gradient boosting