Naive Bayes Classifier
Naive Bayes is a family of probabilistic classifiers based on Bayes' theorem that assume conditional independence between features, offering fast and effective classification especially for text and high-dimensional data.
Overview
Naive Bayes refers to a family of simple probabilistic classifiers built on Bayes' theorem together with a strong, deliberately simplifying assumption: that the input features are conditionally independent of one another given the class label. The assumption is rarely true in practice, which is why the method is called naive, yet the classifiers often perform surprisingly well, particularly on text and other high-dimensional data.
Bayes' theorem and the model
Bayes' theorem relates the probability of a class given the observed features to the probability of the features given the class. In plain terms, the classifier estimates the posterior probability of a class as proportional to the product of the prior probability of that class and the likelihood of each feature given the class. Because of the independence assumption, the joint likelihood is computed simply by multiplying the individual feature likelihoods, which makes training and prediction extremely fast even with thousands of features.
To classify a new example, the model computes this quantity for every candidate class and selects the class with the highest posterior probability, a rule known as the maximum a posteriori decision.
Common variants
Three variants dominate. Multinomial naive Bayes models feature counts and is the standard choice for document classification using word frequencies. Bernoulli naive Bayes models the binary presence or absence of features. Gaussian naive Bayes assumes continuous features follow a normal distribution within each class and is used for real-valued data.
A practical refinement called Laplace or additive smoothing prevents a single unseen feature from forcing a probability to zero, which would otherwise dominate the multiplication.
Strengths and limitations
Naive Bayes trains in a single pass over the data, requires little memory, handles many features gracefully and performs reliably with small training sets. These qualities make it a strong baseline for text classification. Its weaknesses follow from the independence assumption: when features are strongly correlated, probability estimates become poorly calibrated, though the predicted class label is often still correct. The model also cannot capture interactions between features.
Applications
Naive Bayes has long been used for spam filtering, sentiment analysis, document categorisation, language detection and medical screening. It remains popular as a fast first model and as a benchmark against which more complex methods are compared.
References
- Bayes, T. (1763). An Essay towards Solving a Problem in the Doctrine of Chances. Philosophical Transactions of the Royal Society.
- McCallum, A. and Nigam, K. (1998). A Comparison of Event Models for Naive Bayes Text Classification. AAAI Workshop.
- Manning, C., Raghavan, P. and Schutze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.