AIWiki
Malaysia

Search Results

25 results for NLP

Malaysian Context

AI in Malaysian Legal Industry

AI in Malaysia's legal industry encompasses the adoption of machine learning, natural language processing, and generative AI tools by law firms, the judiciary, and legal service providers to automate research, drafting, and compliance tasks.

7 min readUpdated June 2026
Models

BERT

BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained transformer-based language model developed by Google that reads text bidirectionally to understand word context in natural language tasks.

6 min readUpdated June 2026
Foundations

BM25

BM25 (Best Matching 25) is a probabilistic ranking function used in information retrieval that scores documents based on query term frequency, inverse document frequency, and document length normalisation.

7 min readUpdated June 2026
Applications

Chatbot

A chatbot is a software application designed to simulate human conversation through text or voice, ranging from simple rule-based systems to sophisticated AI assistants powered by large language models.

3 min readUpdated May 2026
Companies & Tools

Cohere

Cohere is a Canadian AI company specialising in enterprise large language models, offering Command, Embed, and Rerank model families alongside secure deployment infrastructure designed for regulated industries.

6 min readUpdated May 2026
Foundations

Embedding

An embedding is a dense numerical vector representation of data — such as text, images, or audio — that encodes semantic meaning in a continuous high-dimensional space, enabling machine learning models to measure similarity and relationships.

6 min readUpdated May 2026
Foundations

Instruction Tuning

Instruction tuning is a supervised fine-tuning technique that trains large language models on datasets of instruction-response pairs, enabling models to follow natural language directions and generalise to unseen tasks in a zero-shot or few-shot setting.

7 min readUpdated June 2026
Foundations

Large Language Models

Large language models (LLMs) are AI systems trained on vast corpora of text to predict and generate natural language. They underpin modern chatbots, code assistants, and generative AI applications.

5 min readUpdated May 2026
Applications

Machine Translation

Machine translation is the automated conversion of text or speech from one natural language into another using rule-based, statistical, or neural systems.

6 min readUpdated May 2026
Applications

Named Entity Recognition

Named entity recognition (NER) is a natural language processing task that identifies and classifies named entities in text — such as people, organisations, locations, and dates — into predefined categories.

6 min readUpdated May 2026
Foundations

Natural Language Generation

Natural Language Generation (NLG) is a subfield of artificial intelligence that automatically produces human-readable text from structured data, semantic representations, or other machine-readable inputs.

7 min readUpdated June 2026
Foundations

Natural Language Processing

Natural language processing (NLP) is the subfield of AI concerned with enabling computers to understand, interpret, manipulate, and generate human language in both text and speech form.

3 min readUpdated May 2026
Applications

Question Answering

Question answering is the natural language processing task of producing accurate answers to questions posed in natural language, often using information retrieval, reading comprehension, or large language models.

5 min readUpdated May 2026
Foundations

Recurrent Neural Network

A recurrent neural network (RNN) is a class of neural network designed for sequential data, where connections between nodes form directed cycles allowing information to persist across time steps.

6 min readUpdated May 2026
Applications

Reranking

Reranking is a two-stage information retrieval technique in which a fast first-stage retriever generates candidate documents, and a more accurate but computationally expensive model re-scores and reorders them.

6 min readUpdated June 2026
Applications

Semantic Search

Semantic search is a search paradigm that retrieves results based on the meaning and intent of a query rather than exact keyword matches, using vector embeddings to measure conceptual similarity between text.

6 min readUpdated May 2026
Infrastructure

Sentence Transformers

Sentence Transformers are neural network models that encode sentences, paragraphs, or short documents into fixed-length dense vector embeddings optimised for semantic similarity comparison.

6 min readUpdated June 2026
Applications

Sentiment Analysis

Sentiment analysis is a natural language processing technique that automatically identifies and classifies the emotional tone of text as positive, negative, or neutral, and is widely used in customer feedback, social media monitoring, and financial analysis.

6 min readUpdated May 2026
Foundations

Sequence-to-Sequence Model

A neural network architecture composed of an encoder that processes an input sequence into a fixed representation and a decoder that generates an output sequence from that representation, forming the foundation for machine translation, summarisation, and dialogue systems.

7 min readUpdated June 2026
Applications

Speech Recognition

Speech recognition, or automatic speech recognition (ASR), is the technology that enables computers to identify and transcribe spoken language into text using acoustic models, language models, and deep learning architectures.

6 min readUpdated May 2026
Applications

Text Summarisation

Text summarisation is the natural language processing task of producing a shorter version of a document that preserves its key information, using extractive or abstractive techniques.

4 min readUpdated May 2026
Foundations

Token

A token is the smallest unit of text processed by a large language model, typically representing a word, subword, or character used as the fundamental input and output element during inference.

6 min readUpdated June 2026
Foundations

Tokenisation

Tokenisation is the process of breaking text into discrete units called tokens — which may represent words, subwords, characters, or symbols — that serve as the fundamental input units for language models and other natural language processing systems.

6 min readUpdated May 2026
Foundations

Vision Transformer

The Vision Transformer (ViT) is a deep learning model that applies the transformer architecture originally designed for NLP directly to sequences of image patches, achieving state-of-the-art results on visual recognition tasks.

5 min readUpdated June 2026
Foundations

Word2Vec

A neural network-based algorithm developed by Google in 2013 that learns dense vector representations of words from large text corpora, capturing semantic and syntactic relationships through distributional similarity.

7 min readUpdated June 2026