AIWiki
Malaysia

AI Foundations

85 articles in this section

Foundations

Activation Function

A mathematical function applied to a neuron's output in a neural network that introduces non-linearity, enabling models to learn complex patterns beyond simple linear relationships.

7 min readUpdated June 2026
Foundations

AI Alignment

AI alignment is the field of research dedicated to ensuring that artificial intelligence systems pursue goals, values, and behaviours that are consistent with human intentions.

5 min readUpdated May 2026
Foundations

AI Bias

Systematic and unfair discrimination introduced into artificial intelligence systems through biased training data, flawed model design, or problematic deployment decisions, leading to unequal outcomes across demographic groups or categories.

8 min readUpdated June 2026
Foundations

AI Literacy

AI literacy is the set of knowledge, skills, and attitudes that enable individuals to understand, evaluate, and use artificial intelligence tools effectively and responsibly in personal, professional, and civic contexts.

7 min readUpdated June 2026
Foundations

AI Planning

AI planning is the discipline of automatically generating a sequence of actions that an intelligent agent can execute to move from an initial state to a goal, increasingly used inside LLM-based agents to decompose and reason about complex tasks.

5 min readUpdated June 2026
Foundations

Artificial Intelligence

Artificial intelligence (AI) is the simulation of human intelligence processes by computer systems, encompassing learning, reasoning, problem-solving, perception, and language understanding.

5 min readUpdated May 2026
Foundations

Attention Mechanism

A neural network technique that enables models to dynamically weight the relevance of different parts of an input sequence when producing each output element, forming the core of transformer architectures.

6 min readUpdated May 2026
Foundations

Autoencoder

An autoencoder is a type of artificial neural network trained to reconstruct its input through a compressed internal representation, used for dimensionality reduction, feature learning, and anomaly detection.

5 min readUpdated May 2026
Foundations

Backpropagation

Backpropagation is the primary algorithm for training neural networks, computing gradients of a loss function with respect to each weight by applying the chain rule of calculus in reverse through the network layers.

6 min readUpdated May 2026
Foundations

Batch Normalisation

Batch normalisation is a deep learning technique that normalises the activations of each layer within a mini-batch to accelerate training and improve model stability.

5 min readUpdated May 2026
Foundations

Bayesian Inference

Bayesian inference is a statistical method that uses Bayes' theorem to update the probability of a hypothesis as new evidence becomes available, providing a principled framework for reasoning under uncertainty.

6 min readUpdated May 2026
Foundations

BM25

BM25 (Best Matching 25) is a probabilistic ranking function used in information retrieval that scores documents based on query term frequency, inverse document frequency, and document length normalisation.

7 min readUpdated June 2026
Foundations

Causal AI

Causal AI is an approach to artificial intelligence that incorporates causal reasoning into machine learning models, enabling them to go beyond correlation-based prediction to answer questions about interventions and counterfactual outcomes.

6 min readUpdated June 2026
Foundations

Constitutional AI

Constitutional AI is an alignment method developed by Anthropic that trains language models to follow a set of written ethical principles by using the model itself to critique and revise its own outputs, reducing dependence on human feedback for harmlessness.

6 min readUpdated May 2026
Foundations

Context Window

The maximum number of tokens — including the prompt, prior conversation, retrieved documents, and the model's own output — that a large language model can process in a single forward pass.

5 min readUpdated May 2026
Foundations

Continual Learning

Continual learning is a machine learning paradigm in which models incrementally acquire knowledge from sequential tasks or data streams without forgetting previously learned information, addressing the stability-plasticity trade-off inherent in neural networks.

7 min readUpdated June 2026
Foundations

Contrastive Learning

Contrastive learning is a self-supervised machine learning paradigm that trains models to produce similar representations for related data pairs and dissimilar representations for unrelated pairs, enabling powerful feature learning without labelled data.

6 min readUpdated June 2026
Foundations

Convolutional Neural Network

A convolutional neural network (CNN) is a type of deep neural network that uses convolutional layers to automatically learn spatial hierarchies of features from grid-structured data, most commonly images.

7 min readUpdated May 2026
Foundations

Cosine Similarity

Cosine similarity is a measure of similarity between two non-zero vectors equal to the cosine of the angle between them, widely used to compare embeddings in search and machine learning.

4 min readUpdated June 2026
Foundations

Cross-Entropy Loss

Cross-entropy loss is the standard objective function for training classification models, measuring the divergence between a predicted probability distribution and the true distribution of labels.

4 min readUpdated June 2026
Foundations

Deep Learning

Deep learning is a subfield of machine learning that uses multi-layered artificial neural networks to learn hierarchical representations from data, enabling state-of-the-art performance across vision, language, and speech tasks.

7 min readUpdated May 2026
Foundations

Differential Privacy

Differential privacy is a mathematical framework for analysing data that guarantees the output of a computation reveals little about any single individual, achieved by adding calibrated random noise to limit each record's influence.

5 min readUpdated June 2026
Foundations

Diffusion Model

A class of generative AI models that learn to reverse a gradual noise-addition process, enabling the generation of high-quality images, audio, and video from random noise guided by text or other conditioning signals.

7 min readUpdated May 2026
Foundations

Direct Preference Optimization

Direct Preference Optimization (DPO) is a stable, computationally efficient algorithm for aligning large language models with human preferences by directly optimising a policy from comparison data, without training a separate reward model or using reinforcement learning.

6 min readUpdated June 2026
Foundations

Domain Adaptation

Domain adaptation is a machine learning technique that transfers a model trained on a labelled source domain to perform effectively on a related but distinct target domain with limited or no labelled target data, addressing distribution shift between domains.

7 min readUpdated June 2026
Foundations

Dropout

A regularisation technique in deep learning that randomly deactivates neurons during training, preventing co-adaptation and improving generalisation. Introduced by Hinton and colleagues in 2012 and formalised in 2014.

5 min readUpdated May 2026
Foundations

Embedding

An embedding is a dense numerical vector representation of data — such as text, images, or audio — that encodes semantic meaning in a continuous high-dimensional space, enabling machine learning models to measure similarity and relationships.

6 min readUpdated May 2026
Foundations

Encoder-Decoder Architecture

A neural network design pattern that compresses an input sequence into an internal representation using an encoder, and then generates an output sequence from that representation using a decoder, foundational to machine translation, summarisation, and many other sequence-to-sequence tasks.

6 min readUpdated May 2026
Foundations

Federated Learning

Federated learning is a machine learning paradigm in which a model is trained across multiple decentralised devices or servers holding local data, without exchanging the raw data itself, preserving privacy while enabling collaborative model improvement.

6 min readUpdated May 2026
Foundations

Few-Shot Learning

Few-shot learning is a machine learning paradigm in which a model learns to perform new tasks or recognise new classes from only a small number of labelled training examples, often just one to five samples per class.

6 min readUpdated May 2026
Foundations

Flash Attention

FlashAttention is an IO-aware exact attention algorithm that restructures the standard attention computation into memory-efficient tiled blocks, dramatically reducing GPU memory usage and wall-clock time for transformer models on long sequences.

6 min readUpdated June 2026
Foundations

Gaussian Process

A non-parametric Bayesian model that defines a distribution over functions, widely used in regression, optimisation, and uncertainty quantification.

6 min readUpdated June 2026
Foundations

Generative Adversarial Network

A generative adversarial network (GAN) is a class of machine learning framework in which two neural networks, a generator and a discriminator, compete against each other to produce synthetic data indistinguishable from real examples.

6 min readUpdated May 2026
Foundations

Gradient Boosting

A machine learning ensemble technique that builds predictive models sequentially, where each new model corrects the errors of its predecessors using gradient descent optimisation.

4 min readUpdated May 2026
Foundations

Gradient Descent

Gradient descent is an iterative optimisation algorithm that minimises a loss function by repeatedly updating model parameters in the direction of the steepest descent, as defined by the negative gradient.

6 min readUpdated May 2026
Foundations

Graph Neural Network

A class of deep learning models designed to operate on graph-structured data, enabling nodes to aggregate and propagate information across their neighbourhoods through a message-passing mechanism.

6 min readUpdated June 2026
Foundations

Hallucination (AI)

A phenomenon in which an artificial intelligence system generates output that is factually incorrect, fabricated, or unsupported by its input, while presenting it with apparent confidence.

6 min readUpdated May 2026
Foundations

Hidden Markov Model

A statistical model that represents systems with unobservable (hidden) states that emit observable outputs, used widely in speech recognition, bioinformatics, and time-series analysis.

5 min readUpdated June 2026
Foundations

Instruction Tuning

Instruction tuning is a supervised fine-tuning technique that trains large language models on datasets of instruction-response pairs, enabling models to follow natural language directions and generalise to unseen tasks in a zero-shot or few-shot setting.

7 min readUpdated June 2026
Foundations

K-Means Clustering

K-means clustering is an unsupervised machine learning algorithm that partitions a dataset into k groups by minimising the sum of squared distances between data points and their assigned cluster centroids.

4 min readUpdated May 2026
Foundations

Knowledge Graph

A structured knowledge representation that encodes entities and their relationships as a directed labelled graph, enabling machines to reason over interconnected facts across diverse domains.

6 min readUpdated June 2026
Foundations

Large Language Models

Large language models (LLMs) are AI systems trained on vast corpora of text to predict and generate natural language. They underpin modern chatbots, code assistants, and generative AI applications.

5 min readUpdated May 2026
Foundations

Layer Normalisation

Layer normalisation is a technique that normalises the inputs across the features of a single training example, stabilising and accelerating the training of deep neural networks, especially transformers.

4 min readUpdated June 2026
Foundations

Long Short-Term Memory (LSTM)

Long Short-Term Memory is a recurrent neural network architecture designed to learn long-range dependencies in sequential data by using gating mechanisms to control information flow.

5 min readUpdated May 2026
Foundations

Machine Learning

Machine learning is a subfield of artificial intelligence in which systems improve their performance on tasks through experience — by automatically learning patterns from data rather than following explicitly programmed rules.

4 min readUpdated May 2026
Foundations

Mamba (Structured State Space Model)

Mamba is a selective state space model architecture that achieves linear-time sequence modelling, offering a computationally efficient alternative to the Transformer for long-context tasks.

6 min readUpdated June 2026
Foundations

Markov Decision Process

A Markov decision process is a mathematical framework for modelling sequential decision-making in which outcomes are partly random and partly under the control of a decision-maker.

4 min readUpdated May 2026
Foundations

Meta-Learning

A machine learning paradigm in which models learn how to learn, acquiring inductive biases across a distribution of tasks so they can adapt rapidly to new tasks with minimal data.

5 min readUpdated May 2026
Foundations

Mixture of Experts

Mixture of Experts (MoE) is a machine learning architecture in which a model routes each input to a small subset of specialised sub-networks called experts, enabling large model capacity at a fraction of the compute cost.

6 min readUpdated June 2026
Foundations

Monte Carlo Methods

A broad class of computational algorithms that use repeated random sampling to obtain numerical results, widely used in machine learning for Bayesian inference, reinforcement learning, and uncertainty estimation.

5 min readUpdated May 2026
Foundations

Multi-Task Learning

Multi-task learning is a machine learning approach in which a model is trained simultaneously on multiple related tasks, using shared representations to improve generalisation and data efficiency compared to training separate single-task models.

7 min readUpdated June 2026
Foundations

Multimodal AI

Artificial intelligence systems that can process, understand, and generate information across multiple data types simultaneously, including text, images, audio, video, and other modalities.

5 min readUpdated May 2026
Foundations

Natural Language Generation

Natural Language Generation (NLG) is a subfield of artificial intelligence that automatically produces human-readable text from structured data, semantic representations, or other machine-readable inputs.

7 min readUpdated June 2026
Foundations

Natural Language Processing

Natural language processing (NLP) is the subfield of AI concerned with enabling computers to understand, interpret, manipulate, and generate human language in both text and speech form.

3 min readUpdated May 2026
Foundations

Neural Network

A neural network is a computational model inspired by biological brains, composed of interconnected layers of nodes that learn patterns from data through weighted connections.

5 min readUpdated May 2026
Foundations

Neural Scaling Laws

Neural scaling laws are empirical relationships describing how the performance of neural networks improves predictably as a function of model size, dataset size, and compute budget, enabling principled resource allocation for AI training.

7 min readUpdated June 2026
Foundations

Neuro-symbolic AI

Neuro-symbolic AI is a hybrid artificial intelligence paradigm that combines neural network-based learning with symbolic reasoning, integrating the pattern recognition strengths of deep learning with the structured reasoning and interpretability of symbolic methods.

6 min readUpdated June 2026
Foundations

Overfitting

Overfitting is a modelling error in machine learning where a model learns the training data too closely, including its noise, and consequently performs poorly on new, unseen data.

5 min readUpdated June 2026
Foundations

Physical AI

Physical AI is artificial intelligence that perceives, reasons about, and acts upon the physical world through embodied systems such as robots, autonomous vehicles, and automated facilities, bridging digital intelligence and real-world action.

5 min readUpdated June 2026
Foundations

Precision and Recall

Precision and recall are two complementary metrics used to evaluate classification models, measuring respectively the correctness of positive predictions and the completeness with which actual positives are identified.

4 min readUpdated June 2026
Foundations

Principal Component Analysis

An unsupervised statistical technique that transforms correlated variables into a smaller set of uncorrelated components that preserve as much variance in the original data as possible.

4 min readUpdated May 2026
Foundations

Proximal Policy Optimization

A reinforcement learning algorithm developed by OpenAI that stabilises policy gradient training by constraining the size of policy updates, widely used for fine-tuning large language models through RLHF.

7 min readUpdated June 2026
Foundations

Random Forest

Random forest is an ensemble machine learning algorithm that builds many decision trees on bootstrapped samples and aggregates their predictions to improve accuracy and reduce overfitting.

6 min readUpdated May 2026
Foundations

Recurrent Neural Network

A recurrent neural network (RNN) is a class of neural network designed for sequential data, where connections between nodes form directed cycles allowing information to persist across time steps.

6 min readUpdated May 2026
Foundations

Regularisation (Machine Learning)

Regularisation is a collection of techniques in machine learning that constrain models during training to reduce overfitting and improve generalisation to unseen data.

5 min readUpdated May 2026
Foundations

Reinforcement Learning

A machine learning paradigm in which an agent learns to make sequential decisions by interacting with an environment and optimising for cumulative reward through trial and error.

7 min readUpdated June 2026
Foundations

Reinforcement Learning from Human Feedback

A machine learning technique that trains a reward model from human preference data and uses it to align large language models with human values, safety requirements, and intended behaviour through reinforcement learning.

7 min readUpdated May 2026
Foundations

Residual Network

A deep convolutional neural network architecture introduced by Microsoft Research in 2015 that uses skip connections to enable training of very deep networks, winning the ImageNet challenge with a top-5 error rate of 3.57%.

7 min readUpdated June 2026
Foundations

Responsible AI

A framework of principles and practices that guide the development and deployment of artificial intelligence systems to ensure they are safe, fair, transparent, accountable, and aligned with human values.

7 min readUpdated June 2026
Foundations

Self-Supervised Learning

A machine learning training paradigm in which a model generates its own supervisory signal from unlabelled data by solving pretext tasks, learning rich representations without human-annotated labels.

6 min readUpdated June 2026
Foundations

Sequence-to-Sequence Model

A neural network architecture composed of an encoder that processes an input sequence into a fixed representation and a decoder that generates an output sequence from that representation, forming the foundation for machine translation, summarisation, and dialogue systems.

7 min readUpdated June 2026
Foundations

Softmax Function

The softmax function converts a vector of real-valued scores into a probability distribution, and is widely used as the output layer of neural network classifiers and in attention mechanisms.

4 min readUpdated June 2026
Foundations

Sovereign AI

Sovereign AI is the capacity of a nation to develop, deploy, and govern artificial intelligence using its own infrastructure, data, talent, and models, ensuring strategic autonomy and alignment with domestic laws and values.

5 min readUpdated June 2026
Foundations

Sparse Autoencoder

A sparse autoencoder is a type of autoencoder trained with a sparsity constraint that forces most neurons in the hidden layer to be inactive for any given input, producing a disentangled, interpretable feature decomposition.

7 min readUpdated June 2026
Foundations

Support Vector Machine

A support vector machine (SVM) is a supervised machine learning algorithm that finds the optimal hyperplane separating data points of different classes by maximising the margin between the boundary and the nearest training examples.

7 min readUpdated May 2026
Foundations

TinyML

TinyML is a field of machine learning focused on running machine learning models on microcontrollers and other resource-constrained edge devices that typically operate with milliwatts of power and kilobytes of memory.

6 min readUpdated May 2026
Foundations

Token

A token is the smallest unit of text processed by a large language model, typically representing a word, subword, or character used as the fundamental input and output element during inference.

6 min readUpdated June 2026
Foundations

Tokenisation

Tokenisation is the process of breaking text into discrete units called tokens — which may represent words, subwords, characters, or symbols — that serve as the fundamental input units for language models and other natural language processing systems.

6 min readUpdated May 2026
Foundations

Transfer Learning

Transfer learning is a machine learning technique in which a model pre-trained on one task or dataset is adapted for a different but related task, enabling high performance with significantly less data and compute than training from scratch.

6 min readUpdated May 2026
Foundations

Transformer Architecture

A neural network architecture introduced in 2017 that uses self-attention mechanisms to process sequential data in parallel, forming the foundation of modern large language models and multimodal AI systems.

7 min readUpdated May 2026
Foundations

Variational Autoencoder

A variational autoencoder is a generative neural network that learns a probabilistic latent representation of data, enabling smooth sampling and reconstruction of new examples.

5 min readUpdated May 2026
Foundations

Vision Transformer

The Vision Transformer (ViT) is a deep learning model that applies the transformer architecture originally designed for NLP directly to sequences of image patches, achieving state-of-the-art results on visual recognition tasks.

5 min readUpdated June 2026
Foundations

Word2Vec

A neural network-based algorithm developed by Google in 2013 that learns dense vector representations of words from large text corpora, capturing semantic and syntactic relationships through distributional similarity.

7 min readUpdated June 2026
Foundations

World Models

World models are AI systems that build internal representations of how the environment works, enabling machines to simulate, plan, and reason about future states without requiring direct experience.

7 min readUpdated June 2026
Foundations

Zero-Shot Learning

Zero-shot learning is a machine learning paradigm in which a model makes accurate predictions on categories it has never seen during training by leveraging semantic descriptions or attribute representations.

6 min readUpdated May 2026