What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Decision Tree

A decision tree is a non-parametric supervised learning model that predicts an outcome by recursively splitting data into branches based on feature values, used for both classification and regression.

4 min readLast updated June 2026Foundations

A decision tree is a non-parametric supervised machine learning model that predicts a target value by following a sequence of simple tests on the input features, arranged in a tree structure. Each internal node represents a test on a feature, each branch represents an outcome of that test, and each leaf node holds a predicted result. Decision trees are used for both classification, where leaves represent class labels, and regression, where leaves represent numerical values. Their transparency makes them one of the most interpretable models in machine learning.

Structure and how it learns

A decision tree partitions the feature space into distinct regions by recursively splitting the data. Starting from the root, the algorithm selects the feature and threshold that best separate the training examples, creates child nodes for each outcome, and repeats the process on each subset until a stopping condition is met, such as a maximum depth or a minimum number of samples per leaf. To decide which split is best, the algorithm uses a criterion that measures how well a split separates the data. For classification, common criteria are Gini impurity and information gain based on entropy; for regression, splits are usually chosen to minimise the variance or the sum of squared residuals within each resulting group.

The CART algorithm, short for Classification and Regression Trees, is a widely used method that builds binary trees and supports both task types. Earlier algorithms such as ID3 and C4.5 popularised entropy-based splitting for classification.

Strengths and weaknesses

Decision trees are easy to understand and to visualise, require little data preparation, handle both numerical and categorical features, and make predictions quickly, with computational cost that scales favourably with the number of samples. Their main weakness is a tendency to overfit, growing complex trees that memorise noise in the training data and generalise poorly. This is usually controlled by pruning, limiting tree depth, or requiring a minimum number of samples per split. Trees can also be unstable, since small changes in the data may produce a very different structure.

| Property | Decision tree | | --- | --- | | Interpretability | High | | Data preparation | Minimal | | Overfitting risk | High without pruning | | Handles mixed features | Yes |

Ensembles

The limitations of single trees are largely overcome by ensemble methods that combine many trees. Random forests train many trees on random subsets of the data and features and average their predictions to reduce variance. Gradient boosting builds trees sequentially, with each tree correcting the errors of the previous ones, and underpins high-performing libraries such as XGBoost, LightGBM, and CatBoost. These ensemble approaches consistently rank among the strongest models for structured, tabular data, which remains common in business and finance.

Malaysian Context — Tabular Models in Finance and Industry

Decision trees and their tree-based ensembles are heavily used across Malaysian industry because much of the country's enterprise data is tabular, the format on which these models excel. In financial services, banks such as Maybank, CIMB, RHB, and Public Bank apply gradient boosting and random forest models for credit scoring, fraud detection, and customer churn prediction, all areas where decision tree ensembles perform strongly. These uses fall under the responsible AI expectations set out in Bank Negara Malaysia's guidance on AI in financial services.

Interpretability is a particular advantage in regulated Malaysian sectors. Because a single decision tree can be read as a sequence of clear rules, it supports the explainability that regulators and auditors expect for decisions such as loan approvals, complementing requirements around fair lending and the Personal Data Protection Act. This makes tree-based models attractive where fully opaque deep learning would be harder to justify.

Beyond finance, tree-based models support demand forecasting in retail chains, quality prediction in manufacturing facilities in Penang and the Klang Valley, and risk analysis in insurance and telecommunications firms such as Maxis and TM. They are taught widely in data science programmes at Malaysian universities and in courses funded by the Human Resources Development Corporation (HRD Corp).

For many Malaysian organisations beginning their AI journey, decision tree ensembles offer a practical, well-understood, and resource-efficient starting point before they consider more complex deep learning systems.

References

Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and Regression Trees. Wadsworth.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann.
Coursera. Decision Trees in Machine Learning: Two Types (+ Examples).
scikit-learn. Decision Trees — User Guide.

Tags:decision tree supervised learning classification regression CART

Type	Supervised learning model
Tasks	Classification and regression
Structure	Nodes, branches, leaves
Split criteria	Gini, information gain, variance
Notable algorithms	ID3, C4.5, CART
Related	Random forest, gradient boosting