Compute Cloud

This service provides secure and scalable computing capacity for hosting, testing and prototyping your projects.

GPU-accelerated computing instances use top-of-line NVIDIA^® GPUs, such as NVIDIA^® H100 Tensor Core, and are specifically designed for AI training, deep learning and other high-performance computing workloads.

Get started Documentation

Latest GPUs available

Solve complex computing problems with thousands of NVIDIA^® H100 Tensor Core GPUs of full mesh connection without any oversubscription and with latest InfiniBand network up to 3.2Tb/s per host.

Learn more

Unlimited scaling

Scale effortlessly from one to eight GPUs in a single virtual machine, or expand to thousands in Infiniband clusters. Choose between reserving guaranteed capacity and adapting flexibly with a pay-as-you-go model.

Convenient control

Manage your VMs in the console, via the CLI or using popular tools like Terraform, Packer, or Jenkins. Choose the necessary number of cores, disks, RAM, and the amount of GPU. Easily monitor their utilization and associated costs.

ML/AI cycle use cases where this service is essential

ML/AI requires extensive computational resources, as they often involve large datasets and complex algorithms that can take a long time to process. Compute Cloud provides storage options that allow easy access, retrieval and organization of data for processing.

Works with

Object Storage

AI model training requires significant computational power. Using a Compute Cloud service with easily accessed NVIDIA H100 GPUs allows you to easily scale up or down the resources based on your specific needs. It also enables you to quickly launch virtual machines with pre-configured environments, frameworks and libraries, reducing the time to set up and deploy your AI models.

Nebius AI provides flexibility in deploying AI models by providing multiple deployment options. Models can be deployed on cloud instances or embedded within web applications, always accessible and available for inference, even during peak loads or hardware failures.

Works with

Managed Kubernetes

Best of NVIDIA GPUs available

L40S

Great choice for inference of modern generative AI models with intensive loads.

V100

A cost-effective choice for inference and fine-tuning of time-proven models, not requiring BF16 precision support.

A100

Effective for inference and fine-tuning of conventional models with moderate loads.

H100 with Infiniband

Perfect for all model production and operational tasks, whether using a single GPU or thousands in a GPU cluster.

H200

Best, if speed is your top priority. Coming soon!

B200

Next-level performance for training and inference. Coming soon!

Intuitive cloud console for a smooth user experience

Create a VM with an operating system optimized for your tasks and monitor GPU usage.

Create VM now

Need custom pricing for a large-scale project?

Leave your contact details, and our cloud specialists will get back to you promptly with a transparent and personalised pricing that meets your specific needs.

Start your journey with these in-depth guides

Getting started

Create a VM with top-of-line NVIDIA^® GPU in the management console.

Step-by-step guides

Check out our step-by-step guides that will help you with routine operations.

Concepts overview

Learn more about concepts and resources of the service.

Questions and answers about Compute Cloud

What is Compute Cloud?

Compute Cloud by Nebius AI is a scalable, high-performance virtual machine service that enables you to host, test and prototype your AI and ML projects on demand.

How does Nebius AI differ from regular hosting?

Which GPU should I choose?

Here are our general recommendations:

H100 is best for bigNLP, LLM, Transformers and if you need an InfiniBand GPU cluster,
A100 provides better price-performance ratio for conventional CNN networks,
V100 is good for dealing with small and middle-size models that do not require BF16 precision support,
L40 is a great fit for generative AI inference and visual computing workloads,
L4 is low-cost universal GPU that can support a wide range of use cases — the entry ticket to the world of GPU computing.

These comparisons all depend on the yours specific criteria — resource needs, deployment environment and workload combinations. Contact our sales team to find the most cost-effective solution for your projects.

Why is GPU memory important?

GPU-accelerated AI applications require trained neural networks to be stored in GPU memory to run the required tasks when needed.

The NVIDIA’s GPU programming framework CUDA consists of little programs that live in the GPU to do all kinds of calculations like physics or linear algebra.

GPU-accelerated simulations e.g. for CAE or physics workloads, will use CUDA apps and hold calculation data in GPU memory.

Lots of data are stored in GPU memory, and larger GPU memory lets you do more things at the same time. Learn more about specification GPUs we provide.

What is GPU cluster?

A GPU cluster allows you to combine many (thousands) GPUs into one computing environment. This may be required for AI model distributed training tasks and in cases when there is not enough power or GPU memory of one compute node for AI model inference. In addition to the computer nodes and their respective GPUs, a fast enough interconnect is needed to shuttle data amongst the nodes.

Nebius AI provides advanced GPU-clusters with NVIDIA^® H100 Tensor Core and the InfiniBand 400 Gbit/sec per card interconnection.

Start your journey

More to know

About us

Pricing

Contact sales