AIWiki
Malaysia

Principal Component Analysis

An unsupervised statistical technique that transforms correlated variables into a smaller set of uncorrelated components that preserve as much variance in the original data as possible.

4 min readLast updated May 2026Foundations

Principal component analysis (PCA) is an unsupervised statistical method that re-expresses a dataset of correlated variables as a smaller set of orthogonal axes called principal components, ranked by the amount of variance each one captures. First proposed by Karl Pearson in 1901 and formalised by Harold Hotelling in 1933, PCA remains one of the most widely used techniques in statistics, signal processing, and machine learning for exploratory analysis, dimensionality reduction, and feature engineering.

Mathematical foundation

Given a centred data matrix X with n observations and p variables, PCA finds the orthogonal directions in feature space along which the data varies the most. These directions are the eigenvectors of the sample covariance matrix; their associated eigenvalues quantify the variance captured along each axis. Equivalently, PCA can be computed through the singular value decomposition, written as X = U S V^T, where the columns of V are the principal components and the squared singular values divided by (n minus 1) are the variances along each component.

The first principal component captures the largest share of total variance, the second captures the largest share orthogonal to the first, and so on. Practitioners typically choose the smallest number of components that explain a target proportion of variance — commonly 80 to 95 percent — and project the data onto that reduced subspace.

Common variants

Standard PCA assumes linear relationships and is sensitive to feature scale, so inputs are usually standardised to zero mean and unit variance beforehand. Kernel PCA applies the kernel trick to capture non-linear structure. Sparse PCA adds an L1 penalty to the loadings to produce more interpretable components. Incremental PCA processes data in mini-batches to handle datasets that do not fit in memory. Robust PCA separates a low-rank matrix from sparse outliers and is widely used in video background subtraction.

Applications

PCA is used across many fields. In computer vision, the classical eigenfaces method applies PCA to facial recognition. In genomics, PCA visualises population structure from thousands of single nucleotide polymorphisms. In finance, PCA underpins yield-curve modelling and risk factor extraction. In machine learning pipelines, PCA reduces input dimensionality before clustering, classification, or visualisation in two or three dimensions. Recommender systems use closely related matrix-factorisation approaches that share the same linear-algebraic foundations.

PCA is also a foundational compression and denoising tool: reconstructing data from the top components removes high-frequency noise while preserving the dominant structure.

Limitations

Because PCA is linear and variance-driven, it can miss patterns that depend on higher-order moments or non-linear manifolds. Components are not directly interpretable in the original feature space without inspecting the loadings. Modern alternatives — t-SNE, UMAP, and deep autoencoders — often produce better low-dimensional visualisations, while variational autoencoders provide a generative extension that PCA does not.

References

  1. Pearson, K. (1901). On Lines and Planes of Closest Fit to Systems of Points in Space. Philosophical Magazine, 2(11), 559–572.
  2. Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology.
  3. Jolliffe, I. T., and Cadima, J. (2016). Principal component analysis: a review and recent developments. Phil. Trans. R. Soc. A.
  4. IBM (2024). What is Principal Component Analysis (PCA)? ibm.com.