What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Regularisation (Machine Learning)

Regularisation is a collection of techniques in machine learning that constrain models during training to reduce overfitting and improve generalisation to unseen data.

5 min readLast updated May 2026Foundations

Regularisation in machine learning is the umbrella term for techniques that modify the training procedure to discourage models from fitting noise in the training data and instead encourage them to learn patterns that generalise to unseen inputs. Without regularisation, sufficiently expressive models — especially deep neural networks — tend to memorise their training set, achieving low training error while performing poorly on validation and production data. Regularisation is therefore one of the most important practical levers in supervised learning.

The bias–variance perspective

A common way to think about regularisation is through the bias–variance decomposition of generalisation error. Underfitting reflects high bias: the model is too inflexible to capture the underlying signal. Overfitting reflects high variance: the model captures patterns that vary across draws from the same distribution. Regularisation shifts the trade-off toward higher bias and lower variance, accepting a small loss in training fit in exchange for a larger reduction in error on unseen data.

Classical explicit regularisation

Several methods modify the loss function directly.

L2 regularisation (also called weight decay or ridge regression) adds a penalty proportional to the sum of squared weights. It discourages large weights, producing smoother decision boundaries and more numerically stable optimisation. In its modern implementation as decoupled weight decay (AdamW), it is a default in most deep learning training recipes.

L1 regularisation (lasso) adds a penalty proportional to the sum of absolute weight values. Because the L1 penalty has a non-smooth point at zero, it tends to drive some weights exactly to zero, producing sparse models that are easier to interpret and cheaper to deploy.

Elastic net combines L1 and L2 penalties, balancing sparsity and stability. It is widely used in tabular machine learning and feature selection.

A unified way to think about these is that they shrink parameters toward simpler defaults — zero for L1 and L2, or a prior in Bayesian formulations.

Stochastic and architectural regularisation

Other techniques operate on the model or the training procedure rather than on the loss.

Dropout randomly zeroes a fraction of activations during training, forcing redundant representations and approximating an ensemble of subnetworks. It remains a standard component of fully connected and recurrent layers.

Batch normalisation and layer normalisation stabilise activations and have a documented regularising side-effect, partly because batch statistics inject noise during training.

Early stopping halts training when validation loss stops improving, preventing the model from continuing to memorise the training data.

Data augmentation synthesises additional training examples by applying label-preserving transformations: cropping, flipping, colour jitter, mixup, cutmix, and rotation for vision; synonym substitution, back-translation, and span masking for text; pitch shifting and noise injection for audio. Augmentation is one of the most effective regularisers because it directly enlarges the effective training distribution.

Label smoothing replaces hard one-hot training targets with slightly softened distributions, discouraging the model from becoming overconfident.

Mixup, cutmix, and stochastic depth randomly combine or drop pieces of inputs or layers during training, producing strong empirical gains on image classification benchmarks.

Implicit regularisation

A line of research has highlighted that even without explicit regularisation, choices such as stochastic gradient descent, learning-rate schedules, weight initialisation, and architecture all bias optimisation toward solutions with desirable generalisation properties. The fact that overparameterised neural networks generalise at all is partly explained by this implicit regularisation.

Practical guidance

Production deep learning recipes typically combine several regularisers. A representative image classification recipe might use AdamW with weight decay 0.05, dropout 0.1 in the classification head, label smoothing 0.1, mixup and cutmix, random erasing, and a cosine learning-rate schedule with early stopping. Tabular boosted-tree models rely on tree depth, subsampling, and L2 leaf regularisation. Large language model pre-training relies primarily on weight decay, dropout, and the implicit regularisation of large diverse data.

| Technique | Modifies | Cost | Strength | |---|---|---|---| | L2 / weight decay | Loss | Trivial | Mild and broadly applicable | | L1 / lasso | Loss | Trivial | Sparsity, interpretation | | Dropout | Activations | Small | Strong on dense layers | | Early stopping | Schedule | None | Always recommended | | Data augmentation | Inputs | Moderate | Very strong on vision/audio | | Label smoothing | Targets | Trivial | Calibration | | Mixup / cutmix | Inputs | Small | Strong on classification |

Malaysian Context — Regularisation in Malaysian ML practice

Regularisation is foundational across Malaysian machine learning deployments and appears in every well-engineered training pipeline. Banks including Maybank, CIMB, RHB, Hong Leong Bank, and Public Bank rely on regularised gradient-boosted trees (LightGBM, XGBoost, CatBoost) for credit, fraud, and propensity scoring — models whose generalisation under shifting customer behaviour is closely watched by Bank Negara Malaysia model-governance reviewers.

In manufacturing, system integrators such as ViTrox, Pentamaster, and Greatech apply data augmentation, dropout, and weight decay extensively in vision-QC models trained on relatively small defect datasets collected at semiconductor and E&E plants in Penang, Kulim, and Shah Alam. Augmentation is particularly important because rare defect classes are otherwise underrepresented.

In healthcare, Pantai Hospital, IHH Healthcare, and Ministry of Health (KKM) research initiatives — together with academic groups at Universiti Malaya Medical Centre (UMMC) and Universiti Sains Malaysia — emphasise rigorous cross-validation, augmentation, and Bayesian regularisation when training models on small clinical datasets to satisfy the Malaysia Medical Devices Authority (MDA).

HRD Corp-funded training programmes delivered through Malaysian universities (UM, UTM, UKM, USM, UPM, MMU, APU) and hyperscaler training providers (AWS, Microsoft, Google, NVIDIA) cover regularisation theory and practice as standard curriculum.

References

Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society.
Srivastava, N. et al. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. JMLR.
Loshchilov, I. and Hutter, F. (2019). Decoupled Weight Decay Regularization (AdamW). ICLR.
Zhang, H. et al. (2018). mixup: Beyond Empirical Risk Minimization. ICLR.
Müller, R., Kornblith, S., Hinton, G. (2019). When Does Label Smoothing Help?. NeurIPS.

Tags:regularisation overfitting generalisation training

Type	Training technique family
Goal	Reduce overfitting, improve generalisation
Common methods	L1, L2, dropout, early stopping, data augmentation
Implicit forms	SGD noise, batch norm, weight initialisation
Related	Dropout, batch normalisation, gradient descent