1 result for “cost reduction”
Prompt caching is an inference optimisation technique that stores precomputed key-value representations of repeated prompt prefixes, reducing latency and token processing costs for applications with stable system prompts or long shared contexts.