What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Naive Bayes Classifier

Naive Bayes is a family of probabilistic classifiers based on Bayes' theorem that assume conditional independence between features, offering fast and effective classification especially for text and high-dimensional data.

3 min readLast updated June 2026Foundations

Overview

Naive Bayes refers to a family of simple probabilistic classifiers built on Bayes' theorem together with a strong, deliberately simplifying assumption: that the input features are conditionally independent of one another given the class label. The assumption is rarely true in practice, which is why the method is called naive, yet the classifiers often perform surprisingly well, particularly on text and other high-dimensional data.

Bayes' theorem and the model

Bayes' theorem relates the probability of a class given the observed features to the probability of the features given the class. In plain terms, the classifier estimates the posterior probability of a class as proportional to the product of the prior probability of that class and the likelihood of each feature given the class. Because of the independence assumption, the joint likelihood is computed simply by multiplying the individual feature likelihoods, which makes training and prediction extremely fast even with thousands of features.

To classify a new example, the model computes this quantity for every candidate class and selects the class with the highest posterior probability, a rule known as the maximum a posteriori decision.

Common variants

Three variants dominate. Multinomial naive Bayes models feature counts and is the standard choice for document classification using word frequencies. Bernoulli naive Bayes models the binary presence or absence of features. Gaussian naive Bayes assumes continuous features follow a normal distribution within each class and is used for real-valued data.

A practical refinement called Laplace or additive smoothing prevents a single unseen feature from forcing a probability to zero, which would otherwise dominate the multiplication.

Strengths and limitations

Naive Bayes trains in a single pass over the data, requires little memory, handles many features gracefully and performs reliably with small training sets. These qualities make it a strong baseline for text classification. Its weaknesses follow from the independence assumption: when features are strongly correlated, probability estimates become poorly calibrated, though the predicted class label is often still correct. The model also cannot capture interactions between features.

Applications

Naive Bayes has long been used for spam filtering, sentiment analysis, document categorisation, language detection and medical screening. It remains popular as a fast first model and as a benchmark against which more complex methods are compared.

Malaysian Context — Naive Bayes in Local Applications

Naive Bayes is widely taught and applied in Malaysian academic and industry settings as an accessible entry point to text classification. Researchers at universities such as Universiti Teknologi Malaysia, Universiti Kebangsaan Malaysia and Universiti Sains Malaysia have published work using naive Bayes for sentiment analysis of Malay-language and code-switched social media content, an important capability given Malaysia's multilingual environment of Malay, English, Mandarin and Tamil.

Local businesses and government agencies use naive Bayes for spam and abuse filtering, customer feedback categorisation and support-ticket routing. Its low computational cost makes it suitable for small and medium enterprises supported by MDEC's digitalisation programmes, which often lack the infrastructure for heavier models.

Naive Bayes features prominently in data science courses subsidised by HRD Corp and in university machine learning curricula. As with any model processing customer messages or personal data, deployments must observe the Personal Data Protection Act 2010, administered by the Personal Data Protection Department under the Ministry of Digital.

References

Bayes, T. (1763). An Essay towards Solving a Problem in the Doctrine of Chances. Philosophical Transactions of the Royal Society.
McCallum, A. and Nigam, K. (1998). A Comparison of Event Models for Naive Bayes Text Classification. AAAI Workshop.
Manning, C., Raghavan, P. and Schutze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.

Tags:classification probabilistic model bayes theorem text classification

Type	Probabilistic classifier
Based on	Bayes' theorem
Key assumption	Conditional feature independence
Common variants	Multinomial, Gaussian, Bernoulli
Related	Bayesian inference, Sentiment analysis