What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Search Results

25 results for “NLP”

Malaysian Context

AI in Malaysian Legal Industry

AI in Malaysia's legal industry encompasses the adoption of machine learning, natural language processing, and generative AI tools by law firms, the judiciary, and legal service providers to automate research, drafting, and compliance tasks.

7 min readUpdated June 2026

Models

BERT

BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained transformer-based language model developed by Google that reads text bidirectionally to understand word context in natural language tasks.

6 min readUpdated June 2026

Foundations

BM25

BM25 (Best Matching 25) is a probabilistic ranking function used in information retrieval that scores documents based on query term frequency, inverse document frequency, and document length normalisation.

7 min readUpdated June 2026

Applications

Chatbot

A chatbot is a software application designed to simulate human conversation through text or voice, ranging from simple rule-based systems to sophisticated AI assistants powered by large language models.

3 min readUpdated May 2026

Companies & Tools

Cohere

Cohere is a Canadian AI company specialising in enterprise large language models, offering Command, Embed, and Rerank model families alongside secure deployment infrastructure designed for regulated industries.

6 min readUpdated May 2026

Foundations

Embedding

An embedding is a dense numerical vector representation of data — such as text, images, or audio — that encodes semantic meaning in a continuous high-dimensional space, enabling machine learning models to measure similarity and relationships.

6 min readUpdated May 2026

Foundations

Instruction Tuning

Instruction tuning is a supervised fine-tuning technique that trains large language models on datasets of instruction-response pairs, enabling models to follow natural language directions and generalise to unseen tasks in a zero-shot or few-shot setting.

7 min readUpdated June 2026

Foundations

Large Language Models

Large language models (LLMs) are AI systems trained on vast corpora of text to predict and generate natural language. They underpin modern chatbots, code assistants, and generative AI applications.

5 min readUpdated May 2026

Applications

Machine Translation

Machine translation is the automated conversion of text or speech from one natural language into another using rule-based, statistical, or neural systems.

6 min readUpdated May 2026

Applications

Named Entity Recognition

Named entity recognition (NER) is a natural language processing task that identifies and classifies named entities in text — such as people, organisations, locations, and dates — into predefined categories.

6 min readUpdated May 2026

Foundations

Natural Language Generation

Natural Language Generation (NLG) is a subfield of artificial intelligence that automatically produces human-readable text from structured data, semantic representations, or other machine-readable inputs.

7 min readUpdated June 2026

Foundations

Natural Language Processing

Natural language processing (NLP) is the subfield of AI concerned with enabling computers to understand, interpret, manipulate, and generate human language in both text and speech form.

3 min readUpdated May 2026

Applications

Question Answering

Question answering is the natural language processing task of producing accurate answers to questions posed in natural language, often using information retrieval, reading comprehension, or large language models.

5 min readUpdated May 2026

Foundations

Recurrent Neural Network

A recurrent neural network (RNN) is a class of neural network designed for sequential data, where connections between nodes form directed cycles allowing information to persist across time steps.

6 min readUpdated May 2026

Applications

Reranking

Reranking is a two-stage information retrieval technique in which a fast first-stage retriever generates candidate documents, and a more accurate but computationally expensive model re-scores and reorders them.

6 min readUpdated June 2026

Applications

Semantic Search

Semantic search is a search paradigm that retrieves results based on the meaning and intent of a query rather than exact keyword matches, using vector embeddings to measure conceptual similarity between text.

6 min readUpdated May 2026

Infrastructure

Sentence Transformers

Sentence Transformers are neural network models that encode sentences, paragraphs, or short documents into fixed-length dense vector embeddings optimised for semantic similarity comparison.

6 min readUpdated June 2026

Applications

Sentiment Analysis

Sentiment analysis is a natural language processing technique that automatically identifies and classifies the emotional tone of text as positive, negative, or neutral, and is widely used in customer feedback, social media monitoring, and financial analysis.

6 min readUpdated May 2026

Foundations

Sequence-to-Sequence Model

A neural network architecture composed of an encoder that processes an input sequence into a fixed representation and a decoder that generates an output sequence from that representation, forming the foundation for machine translation, summarisation, and dialogue systems.

7 min readUpdated June 2026

Applications

Speech Recognition

Speech recognition, or automatic speech recognition (ASR), is the technology that enables computers to identify and transcribe spoken language into text using acoustic models, language models, and deep learning architectures.

6 min readUpdated May 2026

Applications

Text Summarisation

Text summarisation is the natural language processing task of producing a shorter version of a document that preserves its key information, using extractive or abstractive techniques.

4 min readUpdated May 2026

Foundations

Token

A token is the smallest unit of text processed by a large language model, typically representing a word, subword, or character used as the fundamental input and output element during inference.

6 min readUpdated June 2026

Foundations

Tokenisation

Tokenisation is the process of breaking text into discrete units called tokens — which may represent words, subwords, characters, or symbols — that serve as the fundamental input units for language models and other natural language processing systems.

6 min readUpdated May 2026

Foundations

Vision Transformer

The Vision Transformer (ViT) is a deep learning model that applies the transformer architecture originally designed for NLP directly to sequences of image patches, achieving state-of-the-art results on visual recognition tasks.

5 min readUpdated June 2026

Foundations

Word2Vec

A neural network-based algorithm developed by Google in 2013 that learns dense vector representations of words from large text corpora, capturing semantic and syntactic relationships through distributional similarity.

7 min readUpdated June 2026