What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Principal Component Analysis

An unsupervised statistical technique that transforms correlated variables into a smaller set of uncorrelated components that preserve as much variance in the original data as possible.

4 min readLast updated May 2026Foundations

Principal component analysis (PCA) is an unsupervised statistical method that re-expresses a dataset of correlated variables as a smaller set of orthogonal axes called principal components, ranked by the amount of variance each one captures. First proposed by Karl Pearson in 1901 and formalised by Harold Hotelling in 1933, PCA remains one of the most widely used techniques in statistics, signal processing, and machine learning for exploratory analysis, dimensionality reduction, and feature engineering.

Mathematical foundation

Given a centred data matrix X with n observations and p variables, PCA finds the orthogonal directions in feature space along which the data varies the most. These directions are the eigenvectors of the sample covariance matrix; their associated eigenvalues quantify the variance captured along each axis. Equivalently, PCA can be computed through the singular value decomposition, written as X = U S V^T, where the columns of V are the principal components and the squared singular values divided by (n minus 1) are the variances along each component.

The first principal component captures the largest share of total variance, the second captures the largest share orthogonal to the first, and so on. Practitioners typically choose the smallest number of components that explain a target proportion of variance — commonly 80 to 95 percent — and project the data onto that reduced subspace.

Common variants

Standard PCA assumes linear relationships and is sensitive to feature scale, so inputs are usually standardised to zero mean and unit variance beforehand. Kernel PCA applies the kernel trick to capture non-linear structure. Sparse PCA adds an L1 penalty to the loadings to produce more interpretable components. Incremental PCA processes data in mini-batches to handle datasets that do not fit in memory. Robust PCA separates a low-rank matrix from sparse outliers and is widely used in video background subtraction.

Applications

PCA is used across many fields. In computer vision, the classical eigenfaces method applies PCA to facial recognition. In genomics, PCA visualises population structure from thousands of single nucleotide polymorphisms. In finance, PCA underpins yield-curve modelling and risk factor extraction. In machine learning pipelines, PCA reduces input dimensionality before clustering, classification, or visualisation in two or three dimensions. Recommender systems use closely related matrix-factorisation approaches that share the same linear-algebraic foundations.

PCA is also a foundational compression and denoising tool: reconstructing data from the top components removes high-frequency noise while preserving the dominant structure.

Limitations

Because PCA is linear and variance-driven, it can miss patterns that depend on higher-order moments or non-linear manifolds. Components are not directly interpretable in the original feature space without inspecting the loadings. Modern alternatives — t-SNE, UMAP, and deep autoencoders — often produce better low-dimensional visualisations, while variational autoencoders provide a generative extension that PCA does not.

Malaysian Context — PCA in Local Research and Industry

Malaysian universities use PCA extensively in published research. Universiti Malaya (UM), Universiti Sains Malaysia (USM), and Universiti Putra Malaysia (UPM) have produced peer-reviewed work applying PCA to oil-palm yield modelling, water-quality monitoring along the Klang River, and gene-expression analysis in tropical diseases.

In the financial sector, Bank Negara Malaysia's research papers describe PCA-based decompositions of the Malaysian Government Securities yield curve, while local asset managers such as Public Mutual and Kenanga Investors apply PCA to extract risk factors from the FTSE Bursa Malaysia KLCI constituents.

Petronas and its subsidiaries apply PCA in upstream operations for reservoir characterisation, multivariate process monitoring, and predictive maintenance of refinery equipment. In manufacturing, electronics firms in Penang's Bayan Lepas free industrial zone use PCA-based multivariate statistical process control to monitor wafer fabrication and assembly lines.

Public-sector adoption is supported by the Department of Statistics Malaysia (DOSM) and the Malaysia Digital Economy Corporation (MDEC), which include PCA in HRD Corp-funded data science courses delivered through Cyberjaya training providers.

References

Pearson, K. (1901). On Lines and Planes of Closest Fit to Systems of Points in Space. Philosophical Magazine, 2(11), 559–572.
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology.
Jolliffe, I. T., and Cadima, J. (2016). Principal component analysis: a review and recent developments. Phil. Trans. R. Soc. A.
IBM (2024). What is Principal Component Analysis (PCA)? ibm.com.

Tags:dimensionality reduction unsupervised learning statistics linear algebra

Type	Unsupervised dimensionality reduction
Introduced	1901 (Karl Pearson); developed by Harold Hotelling, 1933
Mathematical basis	Eigendecomposition of covariance matrix / SVD
Key use	Visualisation, feature extraction, noise reduction
Related	Factor analysis, autoencoders, t-SNE, UMAP