1 result for “LLM serving”
The practice of dynamically selecting which large language model should handle a given query in order to balance cost, latency, and output quality across a pool of models.