Search Results
5 results for “GPU”
CUDA
NVIDIA's parallel computing platform and programming model that lets developers use GPUs for general-purpose computation, underpinning most modern deep learning frameworks.
Flash Attention
FlashAttention is an IO-aware exact attention algorithm that restructures the standard attention computation into memory-efficient tiled blocks, dramatically reducing GPU memory usage and wall-clock time for transformer models on long sequences.
GPU Cluster
A GPU cluster is a networked group of servers, each containing one or more graphics processing units, purpose-built to accelerate parallel computation workloads such as deep learning training and large-scale AI inference.
NVIDIA
An American technology company and the world's dominant supplier of graphics processing units (GPUs) for artificial intelligence training and inference, responsible for the CUDA parallel computing platform and a broad ecosystem of AI hardware and software.
OpenVINO
OpenVINO is an open-source toolkit developed by Intel for optimising and deploying deep learning inference across Intel hardware, including CPUs, GPUs, Neural Processing Units, and FPGAs, with broad support for major AI frameworks and model formats.