What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

BERT

BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained transformer-based language model developed by Google that reads text bidirectionally to understand word context in natural language tasks.

6 min readLast updated June 2026Models

BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained natural language processing (NLP) model developed by Google AI Language and introduced in October 2018. It represented a significant leap in the ability of machines to understand human language by reading text in both directions simultaneously — left-to-right and right-to-left — rather than in a single sequential direction as earlier models did. BERT's architecture and training methodology became a foundational template for virtually all subsequent large language models.

Architecture

BERT is built on the Transformer encoder architecture introduced in the 2017 paper "Attention Is All You Need" by Vaswani et al. Unlike GPT, which uses the decoder portion of the Transformer, BERT uses only the encoder stack. This makes BERT particularly suited to tasks requiring understanding the full context of a sentence rather than generating new text.

The original BERT was released in two sizes. BERT-Base contains 12 transformer encoder layers, 12 attention heads, and 110 million parameters. BERT-Large contains 24 layers, 16 attention heads, and 340 million parameters. Both variants take as input a sequence of tokens and output a contextualised embedding for every token in the sequence.

Positional embeddings are added to token embeddings so the model retains information about word order. A special classification token (CLS) is prepended to every input sequence, and a separator token (SEP) is used to distinguish between paired sentences. The CLS token's output embedding is commonly used as the sentence-level representation for classification tasks.

Training Methodology

BERT is trained using two unsupervised pre-training objectives: Masked Language Modelling (MLM) and Next Sentence Prediction (NSP).

In Masked Language Modelling, a random subset of input tokens — approximately 15 percent — are replaced with a special MASK token, and the model is trained to predict the original tokens from the surrounding context. This bidirectional masking strategy forces the model to learn representations that incorporate both left and right context simultaneously, which is the key innovation distinguishing BERT from earlier unidirectional models such as GPT-1 and ELMo.

In Next Sentence Prediction, the model is given pairs of sentences and trained to predict whether the second sentence follows the first in the original text. This task was designed to help BERT understand inter-sentence relationships, which is relevant for tasks such as question answering and natural language inference.

BERT was pre-trained on the BookCorpus (800 million words) and English Wikipedia (2.5 billion words), totalling roughly 3.3 billion tokens.

Fine-Tuning

One of BERT's defining contributions was demonstrating that a single pre-trained model could be fine-tuned with minimal task-specific modifications to achieve state-of-the-art performance across a wide range of NLP benchmarks. Fine-tuning typically adds a small output layer on top of the pre-trained BERT encoder and trains the combined model on labelled data for the target task.

Tasks BERT has been fine-tuned for include sentiment analysis, named entity recognition, question answering (including the Stanford Question Answering Dataset, SQuAD), text classification, natural language inference, and semantic textual similarity. On the GLUE benchmark — a suite of NLP evaluation tasks — BERT significantly outperformed all prior approaches at release.

Variants and Descendants

The success of BERT prompted a large family of derivative models. RoBERTa (Robustly Optimised BERT Pre-training Approach), developed by Facebook AI Research, removed the NSP objective and trained on larger data with larger batches, achieving improved performance. DistilBERT, produced via knowledge distillation, retains approximately 97 percent of BERT's performance at 40 percent smaller size and 60 percent faster inference. ALBERT (A Lite BERT) introduced parameter sharing across layers to reduce model size without proportional performance loss.

Domain-specific variants include BioBERT for biomedical text, LegalBERT for legal documents, FinBERT for financial text, and multilingual variants such as mBERT, which supports over 100 languages from a single pre-trained checkpoint.

Impact and Legacy

BERT fundamentally changed the NLP research landscape. Prior to BERT, NLP systems typically relied on task-specific architectures with limited transfer across domains. BERT demonstrated that a large general-purpose pre-trained encoder, fine-tuned on small task-specific datasets, could outperform bespoke models trained from scratch on large task-specific datasets.

Google deployed BERT in Google Search in 2019, reporting it as one of the most significant improvements to the search algorithm in five years. The model improved understanding of natural language queries, particularly for longer, conversational searches where prepositions and word order carry significant meaning.

By 2025, BERT-family models remain widely used for natural language understanding tasks in production systems, even as generative LLMs have become dominant for text generation. BERT's encoder-only design makes it computationally efficient for classification and semantic embedding applications at scale.

Malaysian Context — BERT in Local NLP and Enterprise Applications

BERT and its multilingual variant mBERT have seen adoption across Malaysian enterprise and academic contexts, particularly for handling Malaysia's multilingual linguistic landscape. Malaysia's official Malay language (Bahasa Malaysia), combined with significant English, Mandarin Chinese, and Tamil usage, creates challenges for standard NLP models trained predominantly on English text.

Malaysian researchers and institutions have developed Bahasa Malaysia fine-tuned BERT models. Universiti Teknologi Malaysia (UTM) and Universiti Malaya (UM) have published research on fine-tuning BERT for Bahasa Malaysia sentiment analysis and named entity recognition tasks relevant to Malaysian social media content and government documents.

In the banking sector, institutions such as Maybank and CIMB have deployed BERT-based models for customer query classification, intent detection in chatbots, and automated document processing. BERT's ability to handle bilingual Malay-English input — the code-switching common in Malaysian text — has made it particularly valuable for local customer-facing applications.

MDEC (Malaysia Digital Economy Corporation) has highlighted NLP capabilities, including BERT-based models, as a component of Malaysia's AI talent development agenda. Through programmes in collaboration with HRD Corp, Malaysian technology workers have been trained in transformer-based NLP, including fine-tuning pre-trained models for local enterprise applications.

Telecommunications companies such as Telekom Malaysia (TM) and Maxis have used BERT-based classifiers for customer support ticket routing and churn prediction from unstructured text. The model's multilingual capabilities allow classification of tickets written in mixed Malay and English without requiring separate pipelines for each language.

References

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT 2019.
Liu, Y. et al. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692.
Sanh, V. et al. (2019). DistilBERT, a distilled version of BERT. arXiv:1910.01108.
Nayel, H. and Sharf, A. (2025). BERT applications in natural language processing: a review. Artificial Intelligence Review. Springer Nature.
Google AI Blog. (2018). Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing. Google.

Tags:bert nlp language-model google transformer

Full name	Bidirectional Encoder Representations from Transformers
Developed by	Google AI Language
Released	October 2018
Architecture	Transformer encoder stack
Key use	Natural language understanding
Related	RoBERTa, DistilBERT, ALBERT, XLNet

Architecture

Training Methodology

Fine-Tuning

Variants and Descendants

Impact and Legacy

See Also

References