Ray (Distributed Computing Framework)
Ray is an open-source framework for scaling Python and machine learning workloads from a laptop to large clusters, providing unified libraries for distributed training, tuning, serving and reinforcement learning.
Overview
Ray is an open-source framework for distributed computing that allows developers to scale Python applications, and machine learning workloads in particular, from a single machine to a cluster of many machines with minimal code changes. Originally created at the University of California, Berkeley, and now stewarded by the company Anyscale, Ray provides a unified programming model that hides much of the complexity of parallelisation, fault tolerance and resource management.
Ray addresses a practical problem in modern AI: training, tuning and serving large models often exceed the capacity of a single computer, yet writing distributed code by hand is difficult and error-prone. Ray exposes simple primitives that let ordinary Python functions and classes run in parallel across a cluster.
Architecture and libraries
At its foundation, Ray Core provides two key abstractions. Tasks are stateless functions executed remotely and asynchronously, while actors are stateful worker processes that maintain data across calls. A distributed scheduler places this work across available CPUs and GPUs, and a shared object store moves data efficiently between workers.
Built on top of Ray Core is a set of specialised libraries that span the machine learning lifecycle.
| Library | Purpose | | --- | --- | | Ray Train | Distributed model training across multiple GPUs and nodes | | Ray Tune | Scalable hyperparameter tuning and search | | Ray Serve | Deploying and scaling models for online inference | | Ray Data | Distributed data loading and preprocessing | | Ray RLlib | Reinforcement learning at scale |
This integrated stack lets teams use one framework for data processing, training, tuning and deployment rather than stitching together separate tools.
Adoption
Ray has become a standard component of large-scale AI infrastructure. It is used to coordinate training and serving at major technology companies including Uber, Netflix and Shopify, and several leading AI laboratories use Ray to manage the distributed training of large language models. In 2025 the project joined the PyTorch Foundation, reflecting its position in the open machine learning ecosystem. Its appeal lies in scaling existing Python code without rewriting it for a specific cluster technology, and in unifying workloads that would otherwise require multiple systems.
Strengths and limitations
Ray's strengths are a gentle learning curve for Python users, flexibility across diverse workloads, and strong integration with libraries such as PyTorch, TensorFlow and Hugging Face. Its limitations include operational complexity at very large cluster sizes and the need for careful tuning of memory and the object store for demanding jobs. For simple single-machine work it can add unnecessary overhead.
References
- Moritz, P. et al. (2018). Ray: A Distributed Framework for Emerging AI Applications. OSDI.
- Anyscale. (2025). Ray documentation. ray.io.
- PyTorch Foundation. (2025). Ray joins the PyTorch Foundation. Announcement.