Hidden Markov Model
A statistical model that represents systems with unobservable (hidden) states that emit observable outputs, used widely in speech recognition, bioinformatics, and time-series analysis.
A hidden Markov model (HMM) is a probabilistic model in which a system is assumed to follow a Markov process with unobservable, or hidden, states. Each hidden state emits an observable output according to a probability distribution, and the sequence of observations is used to infer the most likely sequence of hidden states. HMMs combine two fundamental constructs: a Markov chain that governs state transitions and an emission model that connects hidden states to observed data. The framework was formalised in the late 1960s by Leonard E. Baum and colleagues at the Institute for Defense Analyses, and it became one of the dominant paradigms for sequence modelling for several decades.
Mathematical formulation
A discrete HMM is specified by a tuple consisting of a finite set of N hidden states, a set of M observation symbols, a transition probability matrix A where A_(ij) gives the probability of moving from state i to state j, an emission probability matrix B where B_(jk) gives the probability of observing symbol k while in state j, and an initial state distribution. The model assumes the Markov property — the probability of the next state depends only on the current state, not on the full history — and conditional independence of observations given the corresponding hidden state. Together these assumptions yield a tractable joint probability over hidden and observed sequences and enable efficient inference.
Canonical algorithms
Three classical problems are associated with HMMs. The evaluation problem asks for the probability of an observation sequence under a given model and is solved by the forward algorithm, which uses dynamic programming to sum over all possible hidden paths in time proportional to T times N squared. The decoding problem asks for the single most likely hidden state sequence given the observations and is solved by the Viterbi algorithm, which replaces the summation with a maximisation operation. The learning problem asks for the model parameters that maximise the likelihood of a training corpus and is solved by the Baum-Welch algorithm, a special case of expectation-maximisation that iteratively re-estimates transition and emission probabilities.
Applications
For much of the 1980s, 1990s, and 2000s, HMMs were the standard architecture for automatic speech recognition, with phonemes modelled as hidden states and acoustic feature vectors as observations. They were also widely used for part-of-speech tagging, named entity recognition, and other sequence labelling tasks in natural language processing. In bioinformatics, profile HMMs underpin tools such as HMMER and Pfam for identifying protein families and gene regions, and pair HMMs are used in sequence alignment. HMMs also appear in finance for regime-switching models, in robotics for state estimation, and in cybersecurity for intrusion detection.
Relationship to modern models
The dominance of HMMs in sequence modelling has been substantially eroded by neural architectures. Recurrent neural networks, long short-term memory networks, and especially transformer-based models have surpassed HMMs on most large-scale benchmarks because they can learn distributed representations and capture long-range dependencies that violate the strict Markov assumption. Nevertheless, HMMs retain advantages in low-data regimes, interpretability, and computational efficiency, and they continue to be taught as a foundational topic in probabilistic machine learning. Hybrid systems that combine HMMs with deep neural networks were also a transitional architecture in speech recognition in the early 2010s.
Variants and extensions
Several extensions of the basic HMM exist. Continuous-density HMMs replace the discrete emission distribution with a Gaussian or Gaussian mixture model, which is essential for modelling acoustic features. Hierarchical HMMs allow states to themselves be sub-HMMs, capturing structure at multiple temporal scales. Coupled and factorial HMMs model multiple interacting state chains, while input-output HMMs condition transitions and emissions on exogenous inputs. Conditional random fields generalise HMMs by allowing arbitrary feature functions and have largely displaced HMMs for supervised sequence labelling.
See Also
References
References
- Rabiner, L. R. (1989). A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77(2).
- Baum, L. E., and Petrie, T. (1966). Statistical Inference for Probabilistic Functions of Finite State Markov Chains. Annals of Mathematical Statistics, 37.
- Eddy, S. R. (2004). What is a Hidden Markov Model? Nature Biotechnology, 22.
- Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer, Chapter 13.