Applying the Nebius AI device topology to Managed Service for Kubernetes clusters with GPU nodes
For Nebius AI virtual machines in GPU clusters, the device topology differs from the default bare-metal one. As a result, the NCCL tests
To run stable NCCL tests as described in the dedicated tutorial and improve workloads performance, apply the Nebius AI topology to your Managed Service for Kubernetes cluster.
Before applying the topology:
- Create a GPU cluster.
- Create a node group with the GPU cluster and a VM configuration supported in GPU clusters.
To apply the topology:
-
Download the
nccl-topo-h100-v1.xml
file from Nebius AI repository . -
Create a namespace for the topology. For example, name it
nccl-test
:kubectl create namespace nccl-test
Warning
Your Kubernetes resources that will use the topology must be created in the same namespace.
-
Create a
ConfigMap
resource with the topology:kubectl create configmap topo-config \ --from-file=nccl-topo-h100-v1.xml \ -n nccl-test