Machine Learning

Machine learning is a subfield of artificial intelligence in which systems improve their performance on tasks through experience — by automatically learning patterns from data rather than following explicitly programmed rules.

4 min readLast updated May 2026Foundations

Machine learning (ML) is the scientific study of algorithms and statistical models that enable computer systems to improve their performance on a specific task through experience. Rather than being explicitly programmed with rules, an ML system is given examples (training data) from which it infers a model that can generalise to new, unseen inputs.

The term was coined by Arthur Samuel in 1959 in the context of a checkers-playing program that improved its strategy through self-play.[^1] Today, ML underpins most commercially deployed AI, from spam filters to medical imaging diagnostics.

Learning Paradigms

Supervised Learning

The system is trained on labelled examples — input-output pairs — and learns a mapping function. Common tasks include:

Classification — assigning inputs to discrete categories (e.g., spam/not spam, tumour/benign)
Regression — predicting continuous values (e.g., house prices, demand forecasting)

Algorithms: linear regression, logistic regression, decision trees, random forests, gradient boosting (XGBoost, LightGBM), support vector machines, neural networks.

Unsupervised Learning

The system discovers hidden structure in unlabelled data.

Clustering — grouping similar data points (k-means, DBSCAN, hierarchical clustering)
Dimensionality reduction — finding compact representations (PCA, t-SNE, UMAP)
Anomaly detection — identifying outliers

Reinforcement Learning (RL)

An agent learns by interacting with an environment, receiving rewards or penalties for actions. RL produced landmark results including AlphaGo (2016) and AlphaStar (2019), and underlies modern RLHF (Reinforcement Learning from Human Feedback) used to align LLMs.

Self-supervised / Foundation Models

A modern paradigm where models are pre-trained on massive unlabelled corpora using proxy tasks (e.g., predicting masked tokens). The resulting representations transfer well to downstream tasks. Large language models such as GPT-4, Claude, and Gemini are trained this way.

The ML Pipeline

A typical production ML system involves:

Data collection and labelling — the most time-consuming phase in practice
Exploratory Data Analysis (EDA) — understanding distributions, correlations, and anomalies
Feature engineering — transforming raw data into informative input representations
Model selection and training — choosing an algorithm and fitting it to training data
Evaluation — measuring performance on held-out data (accuracy, F1, AUC-ROC, etc.)
Hyperparameter tuning — optimising model configuration
Deployment — serving predictions in production (REST API, batch inference, edge)
Monitoring — detecting data drift, performance degradation, concept drift

Common Pitfalls

Overfitting — model memorises training data but fails to generalise
Data leakage — information from the test set contaminates training
Class imbalance — biased predictions when one class heavily dominates
Distribution shift — training and deployment data differ in important ways
Spurious correlations — model relies on coincidental patterns rather than causal features

Malaysian Context — ML Adoption & Talent

Industry adoption — Malaysian enterprises are applying ML most actively in:

Financial services: Maybank, CIMB, and RHB deploy ML for credit scoring, fraud detection, and algorithmic trading. Bank Negara Malaysia's (BNM) AI/ML guidelines require explainability for credit decisions.
Telecommunications: Telekom Malaysia (TM) and Maxis use ML for network optimisation and churn prediction.
Retail/e-commerce: Shopee and Lazada Malaysia employ recommendation systems and demand forecasting.
Plantation/agriculture: ML is used for oil palm disease detection and yield prediction — partnering with MPOB and Sime Darby.

HRDF-claimable training — ML courses offered by MDEC-approved training providers qualify for HRD Corp (formerly HRDF) SBL-Khas claims. This significantly reduces upskilling cost for Malaysian employers.

University programmes — UM, UTM, UPM, and Universiti Malaya's Institute of AI offer ML-specific postgraduate programmes. The government's STEM Malaysia agenda funds scholarships in data science and ML.

Local talent gap — A 2023 MDEC survey estimated a shortfall of ~22,000 AI/ML practitioners by 2025, driving demand for bootcamps, micro-credentials, and industry-academia partnerships.

Evaluation Metrics

| Task | Primary Metrics | |------|----------------| | Classification (balanced) | Accuracy, F1-score, AUC-ROC | | Classification (imbalanced) | Precision-Recall AUC, MCC | | Regression | RMSE, MAE, R² | | Ranking | NDCG, MAP | | Clustering | Silhouette score, Davies-Bouldin |

Key Libraries and Frameworks

scikit-learn — gold standard for classical ML in Python
XGBoost / LightGBM / CatBoost — gradient boosting; dominant in tabular-data competitions
TensorFlow / Keras — Google's deep learning framework
PyTorch — Facebook's deep learning framework; preferred in research
Hugging Face Transformers — pre-trained transformer models for NLP

References

Samuel, A.L. (1959). "Some Studies in Machine Learning Using the Game of Checkers." IBM Journal of Research and Development, 3(3), 210–229.
Mitchell, T.M. (1997). Machine Learning. McGraw-Hill.
MDEC (2023). State of AI in Malaysia 2023. Malaysia Digital Economy Corporation.

Tags:machine learning supervised learning neural networks data science

Parent field	Artificial Intelligence
Key concept	Learning from data
Main paradigms	Supervised, Unsupervised, Reinforcement
Languages	Python, R, Julia
Frameworks	TensorFlow, PyTorch, scikit-learn