AIWiki
Malaysia

Monte Carlo Methods

A broad class of computational algorithms that use repeated random sampling to obtain numerical results, widely used in machine learning for Bayesian inference, reinforcement learning, and uncertainty estimation.

5 min readLast updated May 2026Foundations

Monte Carlo methods are a family of computational algorithms that approximate quantities of interest — integrals, expectations, probabilities — by drawing repeated random samples from a probability distribution and aggregating the results. The methods take their name from the casinos of Monaco and were formalised in the 1940s at Los Alamos National Laboratory by Stanislaw Ulam, John von Neumann and Nicholas Metropolis as part of work on neutron diffusion. In machine learning they provide the foundational machinery for Bayesian inference, reinforcement learning, generative modelling and uncertainty estimation.

Core idea

If a quantity of interest can be written as an expectation of some function f under a distribution p, the Monte Carlo estimator is the simple sample mean: draw N independent samples from p, evaluate f on each, and average. By the law of large numbers the estimator converges to the true expectation, with an error that decreases as one over the square root of N. Crucially, this rate is independent of the dimension of the integral, which is why Monte Carlo dominates in high-dimensional problems where deterministic quadrature is intractable.

Markov Chain Monte Carlo

Direct sampling from a target distribution is often impossible, particularly for the posterior distributions that arise in Bayesian inference. Markov Chain Monte Carlo (MCMC) sidesteps this by constructing a Markov chain whose stationary distribution is the target. Running the chain long enough produces samples that, although correlated, behave as draws from the target for the purpose of estimating expectations.

The Metropolis-Hastings algorithm, introduced by Metropolis and colleagues in 1953 and generalised by Hastings in 1970, proposes candidate moves from a proposal distribution and accepts or rejects them according to a ratio of target densities. Gibbs sampling, a special case in which each variable is updated in turn from its conditional distribution, is widely used when those conditionals are tractable. Hamiltonian Monte Carlo (HMC) and the No-U-Turn Sampler (NUTS) employ gradient information to propose long, efficient moves and underpin probabilistic programming systems such as Stan and PyMC.

Monte Carlo in reinforcement learning

In reinforcement learning, Monte Carlo methods estimate the value of a state or state-action pair as the average return observed across many sampled trajectories. They differ from temporal-difference methods such as Q-learning in that they wait until the end of an episode before updating estimates, trading higher variance for lower bias. Modern policy gradient algorithms, including REINFORCE and Proximal Policy Optimization, are Monte Carlo estimators of the policy gradient.

Monte Carlo dropout and uncertainty

Monte Carlo dropout, proposed by Yarin Gal and Zoubin Ghahramani in 2016, reinterprets dropout at inference time as approximate Bayesian inference. By running multiple stochastic forward passes through a dropout-enabled network and averaging the predictions, practitioners obtain calibrated predictive uncertainty without modifying the training procedure. This technique is now standard in medical imaging and safety-critical applications.

Sequential Monte Carlo and particle filters

Sequential Monte Carlo, also known as particle filtering, maintains a population of weighted samples that are propagated, reweighted and resampled to track a posterior over time. It is widely used in robotics for localisation and mapping, in epidemiology for disease tracking, and in financial modelling for stochastic volatility models.

Limitations

Monte Carlo methods suffer from high variance when the target distribution is poorly explored by the sampler. Diagnosing convergence of MCMC chains is notoriously difficult, with standard tools including the Gelman-Rubin statistic and effective sample size. Variational inference offers a deterministic alternative that trades exactness for speed and is often combined with Monte Carlo estimators in modern Bayesian deep learning.

References

  1. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E. (1953). Equation of State Calculations by Fast Computing Machines. Journal of Chemical Physics 21(6).
  2. Hastings, W. K. (1970). Monte Carlo Sampling Methods Using Markov Chains and Their Applications. Biometrika 57.
  3. Neal, R. M. (2011). MCMC Using Hamiltonian Dynamics. Handbook of Markov Chain Monte Carlo, CRC Press.
  4. Gal, Y. and Ghahramani, Z. (2016). Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. ICML.
  5. Bank Negara Malaysia. (2023). Financial Stability Review. BNM.