AIWiki
Malaysia

Search Results

7 results for token

Foundations

Context Window

The maximum number of tokens — including the prompt, prior conversation, retrieved documents, and the model's own output — that a large language model can process in a single forward pass.

5 min readUpdated May 2026
Infrastructure

KV Cache

A KV cache (key-value cache) is a memory optimisation used in transformer inference that stores pre-computed key and value tensors from the attention mechanism, eliminating redundant recomputation when generating tokens sequentially.

6 min readUpdated June 2026
Companies & Tools

MiniMax

A Chinese AI company and model developer known for the MiniMax-M1 and M2 large language models featuring ultra-long context windows of up to 4 million tokens, strong agentic performance, and open MIT-licensed releases.

5 min readUpdated June 2026
Infrastructure

Prompt Caching

Prompt caching is an inference optimisation technique that stores precomputed key-value representations of repeated prompt prefixes, reducing latency and token processing costs for applications with stable system prompts or long shared contexts.

6 min readUpdated June 2026
Infrastructure

Speculative Decoding

Speculative decoding is an inference acceleration technique that uses a small draft model to propose multiple candidate tokens that a larger target model then verifies in parallel, achieving 2-4x throughput gains without changing output quality.

5 min readUpdated June 2026
Foundations

Token

A token is the smallest unit of text processed by a large language model, typically representing a word, subword, or character used as the fundamental input and output element during inference.

6 min readUpdated June 2026
Foundations

Tokenisation

Tokenisation is the process of breaking text into discrete units called tokens — which may represent words, subwords, characters, or symbols — that serve as the fundamental input units for language models and other natural language processing systems.

6 min readUpdated May 2026