Search Results
25 results for “NLP”
AI in Malaysian Legal Industry
AI in Malaysia's legal industry encompasses the adoption of machine learning, natural language processing, and generative AI tools by law firms, the judiciary, and legal service providers to automate research, drafting, and compliance tasks.
BERT
BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained transformer-based language model developed by Google that reads text bidirectionally to understand word context in natural language tasks.
BM25
BM25 (Best Matching 25) is a probabilistic ranking function used in information retrieval that scores documents based on query term frequency, inverse document frequency, and document length normalisation.
Chatbot
A chatbot is a software application designed to simulate human conversation through text or voice, ranging from simple rule-based systems to sophisticated AI assistants powered by large language models.
Cohere
Cohere is a Canadian AI company specialising in enterprise large language models, offering Command, Embed, and Rerank model families alongside secure deployment infrastructure designed for regulated industries.
Embedding
An embedding is a dense numerical vector representation of data — such as text, images, or audio — that encodes semantic meaning in a continuous high-dimensional space, enabling machine learning models to measure similarity and relationships.
Instruction Tuning
Instruction tuning is a supervised fine-tuning technique that trains large language models on datasets of instruction-response pairs, enabling models to follow natural language directions and generalise to unseen tasks in a zero-shot or few-shot setting.
Large Language Models
Large language models (LLMs) are AI systems trained on vast corpora of text to predict and generate natural language. They underpin modern chatbots, code assistants, and generative AI applications.
Machine Translation
Machine translation is the automated conversion of text or speech from one natural language into another using rule-based, statistical, or neural systems.
Named Entity Recognition
Named entity recognition (NER) is a natural language processing task that identifies and classifies named entities in text — such as people, organisations, locations, and dates — into predefined categories.
Natural Language Generation
Natural Language Generation (NLG) is a subfield of artificial intelligence that automatically produces human-readable text from structured data, semantic representations, or other machine-readable inputs.
Natural Language Processing
Natural language processing (NLP) is the subfield of AI concerned with enabling computers to understand, interpret, manipulate, and generate human language in both text and speech form.
Question Answering
Question answering is the natural language processing task of producing accurate answers to questions posed in natural language, often using information retrieval, reading comprehension, or large language models.
Recurrent Neural Network
A recurrent neural network (RNN) is a class of neural network designed for sequential data, where connections between nodes form directed cycles allowing information to persist across time steps.
Reranking
Reranking is a two-stage information retrieval technique in which a fast first-stage retriever generates candidate documents, and a more accurate but computationally expensive model re-scores and reorders them.
Semantic Search
Semantic search is a search paradigm that retrieves results based on the meaning and intent of a query rather than exact keyword matches, using vector embeddings to measure conceptual similarity between text.
Sentence Transformers
Sentence Transformers are neural network models that encode sentences, paragraphs, or short documents into fixed-length dense vector embeddings optimised for semantic similarity comparison.
Sentiment Analysis
Sentiment analysis is a natural language processing technique that automatically identifies and classifies the emotional tone of text as positive, negative, or neutral, and is widely used in customer feedback, social media monitoring, and financial analysis.
Sequence-to-Sequence Model
A neural network architecture composed of an encoder that processes an input sequence into a fixed representation and a decoder that generates an output sequence from that representation, forming the foundation for machine translation, summarisation, and dialogue systems.
Speech Recognition
Speech recognition, or automatic speech recognition (ASR), is the technology that enables computers to identify and transcribe spoken language into text using acoustic models, language models, and deep learning architectures.
Text Summarisation
Text summarisation is the natural language processing task of producing a shorter version of a document that preserves its key information, using extractive or abstractive techniques.
Token
A token is the smallest unit of text processed by a large language model, typically representing a word, subword, or character used as the fundamental input and output element during inference.
Tokenisation
Tokenisation is the process of breaking text into discrete units called tokens — which may represent words, subwords, characters, or symbols — that serve as the fundamental input units for language models and other natural language processing systems.
Vision Transformer
The Vision Transformer (ViT) is a deep learning model that applies the transformer architecture originally designed for NLP directly to sequences of image patches, achieving state-of-the-art results on visual recognition tasks.
Word2Vec
A neural network-based algorithm developed by Google in 2013 that learns dense vector representations of words from large text corpora, capturing semantic and syntactic relationships through distributional similarity.