Ray Cluster

Name: Ray Cluster
Brand: Nebius

Updated April 25, 2024

Ray is an open-source distributed computing framework built for the deployment and orchestration of scalable distributed computing environments for a variety of large-scale AI workloads. Ray Cluster provides a robust infrastructure for training complex machine learning models and running reinforcement learning algorithms at scale. Leveraging Kubernetes orchestration capabilities, Ray Cluster simplifies the deployment process, allowing users to efficiently allocate resources and manage workloads across clusters. With support for distributed execution and parallelism, Ray Cluster optimizes resource utilization and accelerates model training, enabling faster iteration and experimentation in AI research and development.
You can deploy KubeRay, the Kubernetes operator officially supported by Ray, in your Nebius AI Managed Service for Kubernetes clusters using this Marketplace product.

Warning

Before installing Ray Cluster, you must install NVIDIA^® GPU Operator on the cluster. For details, see the deployment instructions below.

Deployment instructions

Before installing this product:

Create a Kubernetes cluster and a node group with GPUs in it. The product supports the following VM platforms with GPUs:
- NVIDIA^® H100 NVLink with Intel Sapphire Rapids (Types A, B, C)
- NVIDIA^® V100 NVLink with Intel Cascade Lake
- NVIDIA^® V100 PCIe with Intel Broadwell
Note

It is strongly recommended that each node has at least 4 vCPUs and 8 GB of RAM.
Install kubectl and configure it to work with the created cluster.

To install the product:

Click the button in this card to go to the cluster selection form.
Select your cluster and click Continue.
Configure the application:
- Namespace: Select a namespace or create one.
- Application name: Enter an application name.
- Head pod vCPUs: Enter the number of vCPUs that the head pod will use on its node. Default value: 4.
- Head pod RAM: Enter the size of RAM that the head pod will use on its node. Default value: 8Gi.
  
  Note
  
  It is stronly recommended to keep the default values for the head pod so that it takes up an entire node. For more details, see the Ray documentation.
- GPU worker platform: Select the same VM platform as the platform you selected when creating a node group with GPUs.
- Max. number of GPU workers: Enter the maximum number of worker pods with GPUs. Each worker will use one GPU.
- Disable non-GPU workers: If this option is selected, only worker pods with GPUs will be created, and the following settings for non-GPU workers will be ignored.
- Max. number of non-GPU workers: Enter the maximum number of worker pods without GPUs. Default value: 3.
- Non-GPU worker vCPUs: Enter the number of vCPUs that each worker pod without GPUs will use on its node. Default value: 16.
- Non-GPU worker RAM: Enter the size of RAM that the each worker pod without GPUs will use on its node. Default value: 30Gi.
  
  Note
  
  It is stronly recommended to keep the default values for worker pods without GPUs so that they take up their entire nodes. For more details, see the Ray documentation.
- Ray Docker image: Enter the URL of a custom Ray Docker image for the head and worker pods. The image must carry version 2.9.3 of Ray. By default, the official rayproject/ray-ml:2.9.3-gpu image hosted in the Nebius AI container registry is used. For more details about Ray Docker images, see the Ray documentation.
Click Install.
Wait for the application to change its status to Deployed.
To check that the Ray cluster is working, access the Ray dashboard:
1. Set up port forwarding:
```
kubectl -n <namespace> port-forward \
  services/<application_name>-kuberay-head-svc 8265:8265
```
2. Go to http://localhost:8265/ in your web browser.

Use cases

Reinforcement learning research and development.
Distributed model training for deep learning applications.
High-performance computing for scientific simulations and data analysis.
Large-scale data processing and analytics.
Experimentation with parallel algorithms and distributed systems.
Development and deployment of AI-powered applications in production environments.

Links

Ray website Ray documentation Ray on GitHub KubeRay on GitHub

Technical support

Nebius AI does not provide technical support for the product. If you have any issues, please refer to the developer’s information resources.

Product composition

Helm chart	Version	Pull-command	Documentation
cr.nemax.nebius.cloud/yc-marketplace/nebius/ray-cluster/chart/ray-cluster	1.1.0		Open

Docker image	Version	Pull-command
cr.nemax.nebius.cloud/yc-marketplace/nebius/ray-cluster/rayproject/ray-ml1713900777304275129011842529120435612759099215098	2.9.3-gpu
cr.nemax.nebius.cloud/yc-marketplace/nebius/ray-cluster/kuberay/operator1713900777304275129011842529120435612759099215098	v1.1.0
cr.nemax.nebius.cloud/yc-marketplace/nebius/ray-cluster/redis1713900777304275129011842529120435612759099215098	7.2.4-debian-12-r9

Terms

By using this product you agree to the Nebius AI Marketplace Terms of Service and the terms and conditions of the following software: Apache 2.0

Ray Cluster

Platform

Resources

Solutions

Prices

Company

Legal