AI Foundations
85 articles in this section
Activation Function
A mathematical function applied to a neuron's output in a neural network that introduces non-linearity, enabling models to learn complex patterns beyond simple linear relationships.
AI Alignment
AI alignment is the field of research dedicated to ensuring that artificial intelligence systems pursue goals, values, and behaviours that are consistent with human intentions.
AI Bias
Systematic and unfair discrimination introduced into artificial intelligence systems through biased training data, flawed model design, or problematic deployment decisions, leading to unequal outcomes across demographic groups or categories.
AI Literacy
AI literacy is the set of knowledge, skills, and attitudes that enable individuals to understand, evaluate, and use artificial intelligence tools effectively and responsibly in personal, professional, and civic contexts.
AI Planning
AI planning is the discipline of automatically generating a sequence of actions that an intelligent agent can execute to move from an initial state to a goal, increasingly used inside LLM-based agents to decompose and reason about complex tasks.
Artificial Intelligence
Artificial intelligence (AI) is the simulation of human intelligence processes by computer systems, encompassing learning, reasoning, problem-solving, perception, and language understanding.
Attention Mechanism
A neural network technique that enables models to dynamically weight the relevance of different parts of an input sequence when producing each output element, forming the core of transformer architectures.
Autoencoder
An autoencoder is a type of artificial neural network trained to reconstruct its input through a compressed internal representation, used for dimensionality reduction, feature learning, and anomaly detection.
Backpropagation
Backpropagation is the primary algorithm for training neural networks, computing gradients of a loss function with respect to each weight by applying the chain rule of calculus in reverse through the network layers.
Batch Normalisation
Batch normalisation is a deep learning technique that normalises the activations of each layer within a mini-batch to accelerate training and improve model stability.
Bayesian Inference
Bayesian inference is a statistical method that uses Bayes' theorem to update the probability of a hypothesis as new evidence becomes available, providing a principled framework for reasoning under uncertainty.
BM25
BM25 (Best Matching 25) is a probabilistic ranking function used in information retrieval that scores documents based on query term frequency, inverse document frequency, and document length normalisation.
Causal AI
Causal AI is an approach to artificial intelligence that incorporates causal reasoning into machine learning models, enabling them to go beyond correlation-based prediction to answer questions about interventions and counterfactual outcomes.
Constitutional AI
Constitutional AI is an alignment method developed by Anthropic that trains language models to follow a set of written ethical principles by using the model itself to critique and revise its own outputs, reducing dependence on human feedback for harmlessness.
Context Window
The maximum number of tokens — including the prompt, prior conversation, retrieved documents, and the model's own output — that a large language model can process in a single forward pass.
Continual Learning
Continual learning is a machine learning paradigm in which models incrementally acquire knowledge from sequential tasks or data streams without forgetting previously learned information, addressing the stability-plasticity trade-off inherent in neural networks.
Contrastive Learning
Contrastive learning is a self-supervised machine learning paradigm that trains models to produce similar representations for related data pairs and dissimilar representations for unrelated pairs, enabling powerful feature learning without labelled data.
Convolutional Neural Network
A convolutional neural network (CNN) is a type of deep neural network that uses convolutional layers to automatically learn spatial hierarchies of features from grid-structured data, most commonly images.
Cosine Similarity
Cosine similarity is a measure of similarity between two non-zero vectors equal to the cosine of the angle between them, widely used to compare embeddings in search and machine learning.
Cross-Entropy Loss
Cross-entropy loss is the standard objective function for training classification models, measuring the divergence between a predicted probability distribution and the true distribution of labels.
Deep Learning
Deep learning is a subfield of machine learning that uses multi-layered artificial neural networks to learn hierarchical representations from data, enabling state-of-the-art performance across vision, language, and speech tasks.
Differential Privacy
Differential privacy is a mathematical framework for analysing data that guarantees the output of a computation reveals little about any single individual, achieved by adding calibrated random noise to limit each record's influence.
Diffusion Model
A class of generative AI models that learn to reverse a gradual noise-addition process, enabling the generation of high-quality images, audio, and video from random noise guided by text or other conditioning signals.
Direct Preference Optimization
Direct Preference Optimization (DPO) is a stable, computationally efficient algorithm for aligning large language models with human preferences by directly optimising a policy from comparison data, without training a separate reward model or using reinforcement learning.
Domain Adaptation
Domain adaptation is a machine learning technique that transfers a model trained on a labelled source domain to perform effectively on a related but distinct target domain with limited or no labelled target data, addressing distribution shift between domains.
Dropout
A regularisation technique in deep learning that randomly deactivates neurons during training, preventing co-adaptation and improving generalisation. Introduced by Hinton and colleagues in 2012 and formalised in 2014.
Embedding
An embedding is a dense numerical vector representation of data — such as text, images, or audio — that encodes semantic meaning in a continuous high-dimensional space, enabling machine learning models to measure similarity and relationships.
Encoder-Decoder Architecture
A neural network design pattern that compresses an input sequence into an internal representation using an encoder, and then generates an output sequence from that representation using a decoder, foundational to machine translation, summarisation, and many other sequence-to-sequence tasks.
Federated Learning
Federated learning is a machine learning paradigm in which a model is trained across multiple decentralised devices or servers holding local data, without exchanging the raw data itself, preserving privacy while enabling collaborative model improvement.
Few-Shot Learning
Few-shot learning is a machine learning paradigm in which a model learns to perform new tasks or recognise new classes from only a small number of labelled training examples, often just one to five samples per class.
Flash Attention
FlashAttention is an IO-aware exact attention algorithm that restructures the standard attention computation into memory-efficient tiled blocks, dramatically reducing GPU memory usage and wall-clock time for transformer models on long sequences.
Gaussian Process
A non-parametric Bayesian model that defines a distribution over functions, widely used in regression, optimisation, and uncertainty quantification.
Generative Adversarial Network
A generative adversarial network (GAN) is a class of machine learning framework in which two neural networks, a generator and a discriminator, compete against each other to produce synthetic data indistinguishable from real examples.
Gradient Boosting
A machine learning ensemble technique that builds predictive models sequentially, where each new model corrects the errors of its predecessors using gradient descent optimisation.
Gradient Descent
Gradient descent is an iterative optimisation algorithm that minimises a loss function by repeatedly updating model parameters in the direction of the steepest descent, as defined by the negative gradient.
Graph Neural Network
A class of deep learning models designed to operate on graph-structured data, enabling nodes to aggregate and propagate information across their neighbourhoods through a message-passing mechanism.
Hallucination (AI)
A phenomenon in which an artificial intelligence system generates output that is factually incorrect, fabricated, or unsupported by its input, while presenting it with apparent confidence.
Hidden Markov Model
A statistical model that represents systems with unobservable (hidden) states that emit observable outputs, used widely in speech recognition, bioinformatics, and time-series analysis.
Instruction Tuning
Instruction tuning is a supervised fine-tuning technique that trains large language models on datasets of instruction-response pairs, enabling models to follow natural language directions and generalise to unseen tasks in a zero-shot or few-shot setting.
K-Means Clustering
K-means clustering is an unsupervised machine learning algorithm that partitions a dataset into k groups by minimising the sum of squared distances between data points and their assigned cluster centroids.
Knowledge Graph
A structured knowledge representation that encodes entities and their relationships as a directed labelled graph, enabling machines to reason over interconnected facts across diverse domains.
Large Language Models
Large language models (LLMs) are AI systems trained on vast corpora of text to predict and generate natural language. They underpin modern chatbots, code assistants, and generative AI applications.
Layer Normalisation
Layer normalisation is a technique that normalises the inputs across the features of a single training example, stabilising and accelerating the training of deep neural networks, especially transformers.
Long Short-Term Memory (LSTM)
Long Short-Term Memory is a recurrent neural network architecture designed to learn long-range dependencies in sequential data by using gating mechanisms to control information flow.
Machine Learning
Machine learning is a subfield of artificial intelligence in which systems improve their performance on tasks through experience — by automatically learning patterns from data rather than following explicitly programmed rules.
Mamba (Structured State Space Model)
Mamba is a selective state space model architecture that achieves linear-time sequence modelling, offering a computationally efficient alternative to the Transformer for long-context tasks.
Markov Decision Process
A Markov decision process is a mathematical framework for modelling sequential decision-making in which outcomes are partly random and partly under the control of a decision-maker.
Meta-Learning
A machine learning paradigm in which models learn how to learn, acquiring inductive biases across a distribution of tasks so they can adapt rapidly to new tasks with minimal data.
Mixture of Experts
Mixture of Experts (MoE) is a machine learning architecture in which a model routes each input to a small subset of specialised sub-networks called experts, enabling large model capacity at a fraction of the compute cost.
Monte Carlo Methods
A broad class of computational algorithms that use repeated random sampling to obtain numerical results, widely used in machine learning for Bayesian inference, reinforcement learning, and uncertainty estimation.
Multi-Task Learning
Multi-task learning is a machine learning approach in which a model is trained simultaneously on multiple related tasks, using shared representations to improve generalisation and data efficiency compared to training separate single-task models.
Multimodal AI
Artificial intelligence systems that can process, understand, and generate information across multiple data types simultaneously, including text, images, audio, video, and other modalities.
Natural Language Generation
Natural Language Generation (NLG) is a subfield of artificial intelligence that automatically produces human-readable text from structured data, semantic representations, or other machine-readable inputs.
Natural Language Processing
Natural language processing (NLP) is the subfield of AI concerned with enabling computers to understand, interpret, manipulate, and generate human language in both text and speech form.
Neural Network
A neural network is a computational model inspired by biological brains, composed of interconnected layers of nodes that learn patterns from data through weighted connections.
Neural Scaling Laws
Neural scaling laws are empirical relationships describing how the performance of neural networks improves predictably as a function of model size, dataset size, and compute budget, enabling principled resource allocation for AI training.
Neuro-symbolic AI
Neuro-symbolic AI is a hybrid artificial intelligence paradigm that combines neural network-based learning with symbolic reasoning, integrating the pattern recognition strengths of deep learning with the structured reasoning and interpretability of symbolic methods.
Overfitting
Overfitting is a modelling error in machine learning where a model learns the training data too closely, including its noise, and consequently performs poorly on new, unseen data.
Physical AI
Physical AI is artificial intelligence that perceives, reasons about, and acts upon the physical world through embodied systems such as robots, autonomous vehicles, and automated facilities, bridging digital intelligence and real-world action.
Precision and Recall
Precision and recall are two complementary metrics used to evaluate classification models, measuring respectively the correctness of positive predictions and the completeness with which actual positives are identified.
Principal Component Analysis
An unsupervised statistical technique that transforms correlated variables into a smaller set of uncorrelated components that preserve as much variance in the original data as possible.
Proximal Policy Optimization
A reinforcement learning algorithm developed by OpenAI that stabilises policy gradient training by constraining the size of policy updates, widely used for fine-tuning large language models through RLHF.
Random Forest
Random forest is an ensemble machine learning algorithm that builds many decision trees on bootstrapped samples and aggregates their predictions to improve accuracy and reduce overfitting.
Recurrent Neural Network
A recurrent neural network (RNN) is a class of neural network designed for sequential data, where connections between nodes form directed cycles allowing information to persist across time steps.
Regularisation (Machine Learning)
Regularisation is a collection of techniques in machine learning that constrain models during training to reduce overfitting and improve generalisation to unseen data.
Reinforcement Learning
A machine learning paradigm in which an agent learns to make sequential decisions by interacting with an environment and optimising for cumulative reward through trial and error.
Reinforcement Learning from Human Feedback
A machine learning technique that trains a reward model from human preference data and uses it to align large language models with human values, safety requirements, and intended behaviour through reinforcement learning.
Residual Network
A deep convolutional neural network architecture introduced by Microsoft Research in 2015 that uses skip connections to enable training of very deep networks, winning the ImageNet challenge with a top-5 error rate of 3.57%.
Responsible AI
A framework of principles and practices that guide the development and deployment of artificial intelligence systems to ensure they are safe, fair, transparent, accountable, and aligned with human values.
Self-Supervised Learning
A machine learning training paradigm in which a model generates its own supervisory signal from unlabelled data by solving pretext tasks, learning rich representations without human-annotated labels.
Sequence-to-Sequence Model
A neural network architecture composed of an encoder that processes an input sequence into a fixed representation and a decoder that generates an output sequence from that representation, forming the foundation for machine translation, summarisation, and dialogue systems.
Softmax Function
The softmax function converts a vector of real-valued scores into a probability distribution, and is widely used as the output layer of neural network classifiers and in attention mechanisms.
Sovereign AI
Sovereign AI is the capacity of a nation to develop, deploy, and govern artificial intelligence using its own infrastructure, data, talent, and models, ensuring strategic autonomy and alignment with domestic laws and values.
Sparse Autoencoder
A sparse autoencoder is a type of autoencoder trained with a sparsity constraint that forces most neurons in the hidden layer to be inactive for any given input, producing a disentangled, interpretable feature decomposition.
Support Vector Machine
A support vector machine (SVM) is a supervised machine learning algorithm that finds the optimal hyperplane separating data points of different classes by maximising the margin between the boundary and the nearest training examples.
TinyML
TinyML is a field of machine learning focused on running machine learning models on microcontrollers and other resource-constrained edge devices that typically operate with milliwatts of power and kilobytes of memory.
Token
A token is the smallest unit of text processed by a large language model, typically representing a word, subword, or character used as the fundamental input and output element during inference.
Tokenisation
Tokenisation is the process of breaking text into discrete units called tokens — which may represent words, subwords, characters, or symbols — that serve as the fundamental input units for language models and other natural language processing systems.
Transfer Learning
Transfer learning is a machine learning technique in which a model pre-trained on one task or dataset is adapted for a different but related task, enabling high performance with significantly less data and compute than training from scratch.
Transformer Architecture
A neural network architecture introduced in 2017 that uses self-attention mechanisms to process sequential data in parallel, forming the foundation of modern large language models and multimodal AI systems.
Variational Autoencoder
A variational autoencoder is a generative neural network that learns a probabilistic latent representation of data, enabling smooth sampling and reconstruction of new examples.
Vision Transformer
The Vision Transformer (ViT) is a deep learning model that applies the transformer architecture originally designed for NLP directly to sequences of image patches, achieving state-of-the-art results on visual recognition tasks.
Word2Vec
A neural network-based algorithm developed by Google in 2013 that learns dense vector representations of words from large text corpora, capturing semantic and syntactic relationships through distributional similarity.
World Models
World models are AI systems that build internal representations of how the environment works, enabling machines to simulate, plan, and reason about future states without requiring direct experience.
Zero-Shot Learning
Zero-shot learning is a machine learning paradigm in which a model makes accurate predictions on categories it has never seen during training by leveraging semantic descriptions or attribute representations.