1 result for “LLM inference”
Prompt caching is an inference optimisation technique that stores precomputed key-value representations of repeated prompt prefixes, reducing latency and token processing costs for applications with stable system prompts or long shared contexts.