What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Dropout

A regularisation technique in deep learning that randomly deactivates neurons during training, preventing co-adaptation and improving generalisation. Introduced by Hinton and colleagues in 2012 and formalised in 2014.

5 min readLast updated May 2026Foundations

Dropout is a regularisation technique for deep neural networks in which, during training, each neuron is independently set to zero with some probability p on every forward pass. The technique was introduced by Geoffrey Hinton and colleagues in 2012 and formalised in the 2014 paper "Dropout: A Simple Way to Prevent Neural Networks from Overfitting" by Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov. By preventing neurons from co-adapting to specific patterns in the training data, dropout reduces overfitting and has become a near-universal component of deep learning practice.

Mechanism

During training, dropout multiplies the activations of a layer by a random binary mask drawn from a Bernoulli distribution with parameter 1 minus p, where p is the dropout rate. Neurons whose mask value is zero are effectively removed from the network for that forward and backward pass, along with all of their incoming and outgoing connections. To preserve the expected magnitude of activations, the surviving activations are typically scaled by 1 divided by (1 minus p), a convention known as "inverted dropout". At inference time the full network is used without masking and without rescaling.

An equivalent view, articulated by the original authors, is that dropout trains an ensemble of exponentially many thinned sub-networks that share weights. At inference, the deterministic full network approximates the geometric mean of the ensemble's predictions.

Motivation

Geoffrey Hinton has recounted the intuition behind dropout in terms of bank fraud prevention: bank tellers were periodically rotated between branches to prevent them from forming collusive relationships. Analogously, randomly removing different neurons on each training example prevents complex co-adaptations among hidden units, forcing each neuron to learn features that are useful in many different contexts.

Empirical impact

Dropout produced significant improvements on supervised learning tasks in vision, speech recognition and document classification in the early 2010s, and was a key ingredient in AlexNet's 2012 ImageNet result. The technique became standard in feed-forward networks and convolutional neural networks. In recurrent neural networks, naive dropout on hidden-to-hidden connections can harm sequence modelling, leading to variants such as variational dropout, recurrent dropout and DropConnect.

Dropout in transformers and modern architectures

Dropout remains common in transformer models, applied within feed-forward sub-layers, attention output projections and residual connections. Very large transformers trained on internet-scale corpora often use lower dropout rates than smaller models, because the abundance of training data already exerts strong implicit regularisation. Layer drop, stochastic depth and attention dropout are related techniques that randomly remove entire layers or attention heads.

Monte Carlo dropout

Yarin Gal and Zoubin Ghahramani showed in 2016 that running multiple forward passes with dropout enabled at inference time and averaging the predictions yields an approximation to Bayesian model averaging. This technique, known as Monte Carlo dropout, is used to obtain calibrated predictive uncertainty in medical imaging, autonomous driving and other safety-critical applications without the cost of training a full Bayesian neural network.

Variants

DropConnect generalises dropout by masking individual weights rather than entire neurons. Spatial dropout removes entire feature maps in convolutional layers. Variational dropout learns the dropout rate per parameter through a Bayesian objective. Concrete dropout uses a continuous relaxation of the Bernoulli distribution to make the dropout rate differentiable.

Limitations and complements

Dropout slows training because each gradient step operates on a thinner sub-network. It interacts non-trivially with batch normalisation, and modern practice often favours combining lower dropout with batch or layer normalisation, weight decay, label smoothing, mixup and strong data augmentation rather than relying on dropout alone.

Malaysian Context — Dropout in local research and applied ML

Dropout is taught in foundational AI courses at most Malaysian universities, including Universiti Malaya, Universiti Sains Malaysia, Universiti Kebangsaan Malaysia, Universiti Putra Malaysia, Universiti Teknologi Malaysia and Monash University Malaysia. It features in HRD Corp accredited AI training programmes delivered by providers such as ALX Malaysia, Forward School in Penang and the Center of Applied Data Science.

In applied Malaysian research, dropout and Monte Carlo dropout are widely used in medical imaging projects at the Institute for Medical Research and the University of Malaya Medical Centre, where calibrated uncertainty is important for clinician trust in diabetic retinopathy and chest X-ray classifiers. The Malaysian Palm Oil Board has supported convolutional neural network projects for disease detection in oil palm imagery that rely on dropout for regularisation, particularly when training data per disease class is limited.

Local fintech firms regulated by the Securities Commission Malaysia (SC) and Bank Negara Malaysia (BNM) — including Boost, BigPay, MoneyMatch and PolicyStreet — use dropout regularly in their fraud detection and credit scoring models. Maybank's data science teams and CIMB's analytics functions reference dropout in published technical talks. The Malaysia Digital Economy Corporation (MDEC) MDAG-AI grants under the National AI Roadmap 2021–2025 have funded several R&D projects that benchmark dropout variants against modern regularisation alternatives.

References

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. and Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research 15(1), pp. 1929–1958.
Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I. and Salakhutdinov, R. (2012). Improving Neural Networks by Preventing Co-adaptation of Feature Detectors. arXiv:1207.0580.
Gal, Y. and Ghahramani, Z. (2016). Dropout as a Bayesian Approximation. ICML.
Wan, L. et al. (2013). Regularization of Neural Networks using DropConnect. ICML.

Tags:dropout regularisation deep learning overfitting Hinton

Type	Regularisation technique
Proposed by	Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov
Published	2014, Journal of Machine Learning Research
Typical rate	0.1 – 0.5 (probability of deactivation)
Used in	Feed-forward networks, CNNs, RNNs, transformers
Related	Batch normalisation, weight decay, data augmentation