AIWiki
Malaysia

Sentiment Analysis

Sentiment analysis is a natural language processing technique that automatically identifies and classifies the emotional tone of text as positive, negative, or neutral, and is widely used in customer feedback, social media monitoring, and financial analysis.

6 min readLast updated May 2026Applications

Sentiment analysis, also known as opinion mining, is a branch of natural language processing (NLP) concerned with the computational identification of subjective information in text — principally the emotional tone, attitude, or opinion expressed by a writer toward a subject. The most basic form of sentiment analysis classifies text into positive, negative, or neutral categories. More advanced systems perform fine-grained analysis, detecting specific emotions such as anger, joy, fear, or surprise, or identifying the sentiment directed toward particular entities or aspects within a text (aspect-based sentiment analysis).

The field emerged as a formal research area in the early 2000s following the growth of online reviews and forums, and has since become one of the most commercially applied areas of NLP, used across industries from consumer goods and media to finance and human resources.

Task Formulation

Sentiment analysis encompasses several related but distinct tasks. Document-level sentiment analysis assigns a single sentiment label to an entire document, such as a product review. Sentence-level analysis evaluates the sentiment of individual sentences within a document. Aspect-based sentiment analysis (ABSA) identifies both the specific aspect being discussed (for example, battery life in a smartphone review) and the sentiment expressed toward that aspect. This finer-grained approach provides more actionable intelligence for businesses than document-level classification.

Entity-level sentiment analysis identifies named entities mentioned in a text and determines the sentiment expressed toward each, enabling, for instance, the tracking of public opinion about individual companies or political figures across news articles and social media.

Methods and Algorithms

Lexicon-Based Methods

The earliest and most interpretable sentiment analysis systems use sentiment lexicons: dictionaries in which words are associated with pre-assigned sentiment scores or polarity labels. Well-known English lexicons include SentiWordNet and VADER (Valence Aware Dictionary and sEntiment Reasoner). The sentiment of a text is computed by aggregating the scores of its constituent words, with adjustments for negation (not good becomes negative) and intensifiers (very good increases the positive score).

Lexicon-based methods are fast, transparent, and require no training data, making them useful for domains where labelled data is unavailable. Their main limitation is sensitivity to domain shift: a word that is positive in one context may be negative in another (for example, unpredictable can be positive when describing a film plot but negative when describing software behaviour).

Machine Learning Methods

Supervised machine learning approaches frame sentiment analysis as a text classification problem. Features are extracted from text (bag-of-words representations, n-grams, term frequency-inverse document frequency (TF-IDF) vectors) and fed to classifiers such as Naive Bayes, Support Vector Machines (SVMs), or logistic regression. These methods require labelled training data but generalise better across nuanced expressions than purely lexicon-based approaches.

Deep Learning Methods

Recurrent neural networks (RNNs), long short-term memory (LSTM) networks, and convolutional neural networks (CNNs) improved sentiment analysis performance significantly through the 2010s by capturing sequential dependencies and local n-gram patterns in text. Pre-trained word embeddings such as Word2Vec and GloVe provided better initial feature representations than bag-of-words approaches.

The introduction of transformer-based pre-trained models, beginning with BERT (Bidirectional Encoder Representations from Transformers) in 2018, produced a step change in performance. Fine-tuning BERT on labelled sentiment datasets achieved state-of-the-art results across standard benchmarks. Subsequent models including RoBERTa, DistilBERT, and XLM-RoBERTa extended this approach to multiple languages.

Large language models including GPT-4, Claude, and Gemini can perform sentiment analysis through zero-shot and few-shot prompting, making high-quality sentiment classification accessible without labelled training data for the target domain.

Applications

The most widespread commercial application is customer feedback analysis. Businesses collect reviews, support tickets, and survey responses and use sentiment analysis to summarise the overall sentiment, track sentiment trends over time, and identify specific product features or service aspects that drive satisfaction or dissatisfaction.

Social media monitoring tools use sentiment analysis to track public opinion about brands, products, campaigns, and public figures across platforms such as X (formerly Twitter), Facebook, and Instagram. Financial services firms apply sentiment analysis to news articles, earnings call transcripts, and analyst reports to extract signals correlated with market movements — a field known as alternative data or textual analysis in finance.

In human resources, companies analyse employee engagement survey responses and internal communication to gauge workforce sentiment and identify early signals of attrition risk. Healthcare providers analyse patient feedback and clinical notes to monitor patient experience and identify concerns.

Challenges

Sentiment analysis faces several persistent challenges. Sarcasm and irony are difficult to detect without contextual understanding. Implicit sentiment, where the emotional tone is conveyed through factual statements rather than explicit sentiment words, is problematic for lexicon and n-gram based methods. Multilingual sentiment analysis requires labelled data and lexicons for each target language, and low-resource languages remain underserved.

Domain adaptation is another challenge: a model trained on film reviews may perform poorly on financial news because the vocabulary and sentiment conventions differ substantially. Cross-domain and domain-adaptive sentiment models are an active research area.

See Also

References

  1. Pang, B., and Lee, L. (2008). Opinion Mining and Sentiment Analysis. Foundations and Trends in Information Retrieval, 2(1-2), 1-135.
  2. Devlin, J. et al. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT 2019.
  3. Hutto, C., and Gilbert, E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Proceedings of ICWSM 2014.
  4. Pontiki, M. et al. (2016). SemEval-2016 Task 5: Aspect Based Sentiment Analysis. Proceedings of SemEval 2016.
  5. Abdullah, M.T. et al. (2022). Sentiment Analysis for Bahasa Malaysia Social Media Text: A Survey. IEEE Access, 10, 58637-58659.