Search Results
2 results for “high-throughput”
Companies & Tools
Groq
Groq is an American AI inference company that developed the Language Processing Unit (LPU), a custom silicon architecture optimised for high-throughput, low-latency inference of large language models using on-chip SRAM rather than external DRAM.
5 min readUpdated June 2026
Infrastructure
vLLM
vLLM is an open-source library for fast and memory-efficient large language model inference and serving, built around the PagedAttention algorithm for optimised GPU memory management.
6 min readUpdated June 2026