AIWiki
Malaysia

XGBoost

XGBoost (Extreme Gradient Boosting) is an open-source machine learning library that provides a fast, regularised gradient boosting framework, widely used for classification, regression, and ranking on tabular data.

4 min readLast updated June 2026Companies & Tools

XGBoost, short for Extreme Gradient Boosting, is an open-source machine learning library that implements a fast and scalable gradient boosting framework. First released in 2014, it provides interfaces for many programming languages, including C++, Python, R, Java, Scala, Julia, and Perl, and can run on a single machine as well as on distributed processing frameworks such as Apache Hadoop, Apache Spark, Apache Flink, and Dask. XGBoost became one of the most influential tools in applied machine learning, particularly for structured or tabular data, and was for years a dominant method in data-science competitions.

Gradient boosting background

XGBoost belongs to the family of ensemble methods known as gradient boosting, in which many simple models, usually decision trees, are combined into a single strong predictor. The trees are added sequentially, and each new tree is trained to correct the errors of the trees that came before it, gradually reducing the model's overall error. This contrasts with bagging methods such as random forests, which train trees independently and average them. Gradient boosting tends to produce highly accurate models on tabular data, and XGBoost is an optimised, regularised implementation of the idea.

Key features

XGBoost is engineered for speed, memory efficiency, and predictive accuracy. It uses second-order gradient information, meaning it considers both the slope and the curvature of the loss function when building trees, which can improve convergence. It builds trees in parallel and employs a cache-aware prefetching algorithm to reduce runtime on large datasets, and the authors reported that it could run many times faster than earlier boosting implementations on a single machine.

The library handles practical data challenges well. It automatically manages missing values by learning a default direction at each tree split, and it uses a sparsity-aware algorithm to deal efficiently with sparse data such as one-hot-encoded categorical features. To control overfitting, XGBoost incorporates both L1 (lasso) and L2 (ridge) regularisation terms into its objective function, penalising model complexity so that the trees generalise better. It also supports tree pruning, column and row subsampling, and a wide range of tunable hyperparameters that practitioners adjust to balance accuracy and generalisation.

XGBoost supports the standard supervised learning tasks, including regression, binary and multiclass classification, and ranking, and it can be accelerated using graphics processing units for very large datasets.

Use and ecosystem

Because it is fast, accurate, and robust on the kind of tabular data common in business and science, XGBoost is widely used across finance, insurance, healthcare, fraud detection, recommendation, and many other domains. It is frequently compared with related gradient boosting libraries such as LightGBM and CatBoost, which make different design trade-offs, and with random forests; the best choice depends on the dataset and constraints. Despite the rise of deep learning, gradient boosting libraries like XGBoost remain a default choice for structured-data problems, where they often match or exceed neural networks while being faster to train and easier to interpret.

| Aspect | XGBoost | |--------|---------| | Method | Gradient-boosted decision trees | | Regularisation | L1 and L2 | | Missing values | Handled automatically | | Hardware | CPU and GPU | | Distributed | Hadoop, Spark, Flink, Dask | | Licence | Apache 2.0 |

References

  1. Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of KDD 2016.
  2. dmlc. (2026). XGBoost GitHub Repository. https://github.com/dmlc/xgboost
  3. Wikipedia. (2026). XGBoost. https://en.wikipedia.org/wiki/XGBoost
  4. IBM. (2025). What is XGBoost?