Search Results
16 results for “transformer”
Attention Mechanism
A neural network technique that enables models to dynamically weight the relevance of different parts of an input sequence when producing each output element, forming the core of transformer architectures.
BERT
BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained transformer-based language model developed by Google that reads text bidirectionally to understand word context in natural language tasks.
Encoder-Decoder Architecture
A neural network design pattern that compresses an input sequence into an internal representation using an encoder, and then generates an output sequence from that representation using a decoder, foundational to machine translation, summarisation, and many other sequence-to-sequence tasks.
Flash Attention
FlashAttention is an IO-aware exact attention algorithm that restructures the standard attention computation into memory-efficient tiled blocks, dramatically reducing GPU memory usage and wall-clock time for transformer models on long sequences.
Hugging Face
An American AI company and open-source platform that hosts machine learning models, datasets, and applications, widely described as the "GitHub of machine learning" for its role as the central repository of the open AI community.
KV Cache
A KV cache (key-value cache) is a memory optimisation used in transformer inference that stores pre-computed key and value tensors from the attention mechanism, eliminating redundant recomputation when generating tokens sequentially.
Large Language Models
Large language models (LLMs) are AI systems trained on vast corpora of text to predict and generate natural language. They underpin modern chatbots, code assistants, and generative AI applications.
Layer Normalisation
Layer normalisation is a technique that normalises the inputs across the features of a single training example, stabilising and accelerating the training of deep neural networks, especially transformers.
LoRA (Low-Rank Adaptation)
LoRA is a parameter-efficient fine-tuning technique that adapts large pre-trained models by injecting small trainable low-rank matrices into transformer layers, drastically reducing the number of trainable parameters without sacrificing performance.
Mamba (Structured State Space Model)
Mamba is a selective state space model architecture that achieves linear-time sequence modelling, offering a computationally efficient alternative to the Transformer for long-context tasks.
Mixture of Experts
Mixture of Experts (MoE) is a machine learning architecture in which a model routes each input to a small subset of specialised sub-networks called experts, enabling large model capacity at a fraction of the compute cost.
Natural Language Processing
Natural language processing (NLP) is the subfield of AI concerned with enabling computers to understand, interpret, manipulate, and generate human language in both text and speech form.
Optical Character Recognition
A computer vision technology that converts images of typed, handwritten, or printed text into machine-readable digital text, increasingly powered by deep learning and transformer-based vision models.
Sentence Transformers
Sentence Transformers are neural network models that encode sentences, paragraphs, or short documents into fixed-length dense vector embeddings optimised for semantic similarity comparison.
Transformer Architecture
A neural network architecture introduced in 2017 that uses self-attention mechanisms to process sequential data in parallel, forming the foundation of modern large language models and multimodal AI systems.
Vision Transformer
The Vision Transformer (ViT) is a deep learning model that applies the transformer architecture originally designed for NLP directly to sequences of image patches, achieving state-of-the-art results on visual recognition tasks.