Anomaly Detection
A class of machine learning techniques that identifies rare events, observations, or patterns that differ significantly from the majority of data, used for fraud, intrusion, and fault detection.
Anomaly detection, also called outlier detection or novelty detection, is the task of identifying data points, events, or observations that deviate significantly from the expected pattern of the majority. Anomalies are typically rare, costly, or indicative of an underlying issue — a fraudulent transaction, a failing machine, a network intrusion, or a medical abnormality. Because anomalies are by definition uncommon, labelled training data is usually scarce, which makes the task fundamentally different from standard supervised classification and shapes the choice of algorithm and evaluation metric.
Problem framing
Anomaly detection problems are commonly framed in one of three ways. Unsupervised detection assumes no labels and learns the structure of "normal" data from the bulk of the dataset, flagging points that do not fit. Semi-supervised detection is given a clean set of normal examples and trained to recognise deviations from them. Supervised detection assumes both normal and anomalous labels exist; it reduces to a heavily imbalanced classification problem, often addressed with resampling, cost-sensitive loss, or focal loss.
A second axis is whether anomalies are point (a single observation), contextual (anomalous only in a specific context such as time of day), or collective (a sequence or group of points that together are abnormal even if individually unremarkable).
Algorithms
Classical statistical methods rely on assumptions about the underlying distribution. Z-score, modified Z-score, and Mahalanobis distance flag points beyond a chosen threshold. These methods are interpretable and cheap but degrade quickly in high dimensions.
Distance- and density-based methods such as k-nearest neighbours (kNN), Local Outlier Factor (LOF), and DBSCAN exploit the geometry of the feature space. LOF, introduced by Breunig and colleagues in 2000, remains a standard baseline.
Isolation Forest, introduced by Liu and colleagues in 2008, builds an ensemble of random trees and scores points by how few splits are needed to isolate them. It scales well and is robust to high dimensionality.
One-class support vector machines (OC-SVM) and Support Vector Data Description (SVDD) construct a boundary around the normal data in feature space.
Deep learning approaches have become dominant for high-dimensional, structured, or sequential data. Autoencoders are trained to reconstruct normal inputs; high reconstruction error indicates anomaly. Variational autoencoders, GAN-based detectors, and transformer-based time-series models extend this idea. Graph neural networks are increasingly used for anomaly detection in financial transaction networks where the relational structure carries signal.
Evaluation
Because anomalies are rare, accuracy is misleading — a detector that always predicts "normal" can achieve 99.9 percent accuracy on a dataset with one anomaly in a thousand. The area under the precision-recall curve (AUPRC) and precision at K are typically more informative than the area under the ROC curve (AUROC). In production systems, alert volume and analyst workload are practical constraints: a detector that overwhelms analysts with false positives is unusable regardless of its theoretical score.
Applications
Financial fraud is the most economically significant application. Card-not-present fraud, account takeover, money laundering, and synthetic identity fraud are detected by ensembles that combine rule-based features, supervised classifiers, and unsupervised anomaly scores. According to industry surveys, around 90 percent of global banks use AI and machine learning in fraud prevention as of 2025.
Cybersecurity uses anomaly detection for intrusion detection (network and host), insider threat detection, and malware classification. The signal-to-noise ratio is challenging; modern systems pair anomaly scores with threat intelligence and behaviour analytics.
Predictive maintenance monitors vibration, temperature, current, and acoustic signatures of industrial equipment to predict failures before they occur. Petrochemical, power, and manufacturing operators use these systems to reduce unplanned downtime.
Healthcare applications include early warning scores for clinical deterioration, anomaly detection in ECG and EEG signals, and rare disease screening.
Limitations
Anomaly detection inherits all the challenges of operating on long-tail data. Concept drift — the gradual change of what "normal" looks like — requires continuous retraining. Adversarial actors actively shape their behaviour to look normal, particularly in fraud and intrusion contexts. Explainability is essential because false positives have real costs (a frozen account, a blocked transaction) but is difficult to produce for deep methods. Many production systems combine multiple detectors with human-in-the-loop review rather than relying on a single model.
References
- Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly Detection: A Survey. ACM Computing Surveys.
- Breunig, M. M., Kriegel, H.-P., Ng, R. T., & Sander, J. (2000). LOF: Identifying Density-Based Local Outliers. SIGMOD.
- Liu, F. T., Ting, K. M., & Zhou, Z.-H. (2008). Isolation Forest. ICDM.
- Bank Negara Malaysia. (2023). Risk Management in Technology (RMiT) Policy Document. bnm.gov.my.
- Feedzai. (2025). AI Trends in Fraud and Financial Crime Report 2025. feedzai.com.