AIWiki
Malaysia

Decision Tree

A decision tree is a non-parametric supervised learning model that predicts an outcome by recursively splitting data into branches based on feature values, used for both classification and regression.

4 min readLast updated June 2026Foundations

A decision tree is a non-parametric supervised machine learning model that predicts a target value by following a sequence of simple tests on the input features, arranged in a tree structure. Each internal node represents a test on a feature, each branch represents an outcome of that test, and each leaf node holds a predicted result. Decision trees are used for both classification, where leaves represent class labels, and regression, where leaves represent numerical values. Their transparency makes them one of the most interpretable models in machine learning.

Structure and how it learns

A decision tree partitions the feature space into distinct regions by recursively splitting the data. Starting from the root, the algorithm selects the feature and threshold that best separate the training examples, creates child nodes for each outcome, and repeats the process on each subset until a stopping condition is met, such as a maximum depth or a minimum number of samples per leaf. To decide which split is best, the algorithm uses a criterion that measures how well a split separates the data. For classification, common criteria are Gini impurity and information gain based on entropy; for regression, splits are usually chosen to minimise the variance or the sum of squared residuals within each resulting group.

The CART algorithm, short for Classification and Regression Trees, is a widely used method that builds binary trees and supports both task types. Earlier algorithms such as ID3 and C4.5 popularised entropy-based splitting for classification.

Strengths and weaknesses

Decision trees are easy to understand and to visualise, require little data preparation, handle both numerical and categorical features, and make predictions quickly, with computational cost that scales favourably with the number of samples. Their main weakness is a tendency to overfit, growing complex trees that memorise noise in the training data and generalise poorly. This is usually controlled by pruning, limiting tree depth, or requiring a minimum number of samples per split. Trees can also be unstable, since small changes in the data may produce a very different structure.

| Property | Decision tree | | --- | --- | | Interpretability | High | | Data preparation | Minimal | | Overfitting risk | High without pruning | | Handles mixed features | Yes |

Ensembles

The limitations of single trees are largely overcome by ensemble methods that combine many trees. Random forests train many trees on random subsets of the data and features and average their predictions to reduce variance. Gradient boosting builds trees sequentially, with each tree correcting the errors of the previous ones, and underpins high-performing libraries such as XGBoost, LightGBM, and CatBoost. These ensemble approaches consistently rank among the strongest models for structured, tabular data, which remains common in business and finance.

References

  1. Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and Regression Trees. Wadsworth.
  2. Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann.
  3. Coursera. Decision Trees in Machine Learning: Two Types (+ Examples).
  4. scikit-learn. Decision Trees — User Guide.