ML Inference

Inference is the application of a trained model to new data to obtain predictions or conclusions.

Run inference on Nebius AI cloud infrastructure for real-time or on-demand predictions, decision-making, or any other intended purpose.

Get started Contact sales

GPUs for different workloads

We provide AI-tailored GPUs from NVIDIA like L40, A100 and H100.

Support and onboarding assistance

We provide onboarding, assistance with complex cases and with optimizing platform usage, reducing your problem-solving time.

Affordable and reasonable pricing

Maximize your benefits with our reserve pricing models. Contact our Sales for an offer tailored for your needs.

How to choose GPU for inference

V100

Good for dealing with small and middle-size models that do not require BF16 precision support.

Much more affordable than other types of GPU.

А100

For inference of high memory consumption models with moderate load A100 can be more cost effective than H100.

Н100

Best choice if speed is your top priority.

Perfect for bigNLP, LLM, and all models with Transformer architecture.

H200

The world’s most powerful GPU for supercharging AI and HPC workloads coming soon!

Reserve now

Let’s find the best possible technical solution

If you want to use a specific database or third-party software for your project, our team of solution architects is here to assist you at every step of the way.

Talk to a solution architect

Key services

Explore documentation

Getting started

Compute Cloud provides the scalable computing power you need to create and manage virtual machines. Create your first VM or instance group.

Step-by-step guides

Check out our step-by-step guides that will help you with routine operations.

Concepts overview

Learn more about Compute Cloud concepts and resources.

Marketplace products

Ubuntu 22.04 LTS for NVIDIA^® GPUs (CUDA^® 12)

The image contains the NVIDIA Data Center Driver compatible with all GPUs available in Nebius AI and the CUDA^® Toolkit 12.

Ubuntu 22.04 LTS for NVIDIA^® GPUs (CUDA^® 11)

The image contains the NVIDIA Data Center Driver compatible with all GPUs available in Nebius AI and the CUDA^® Toolkit 11.

Ubuntu 20.04 LTS for NVIDIA^® GPUs (CUDA^® 12)

The image contains the NVIDIA Data Center Driver compatible with all GPUs available in Nebius AI and the CUDA^® Toolkit 12.

FAQ and basic terminology

What is machine learning inference?

Machine learning inference is the process of applying a trained machine learning model to new, unseen data to make predictions or decisions.

It’s a crucial step in machine learning pipelines where the model, previously trained on historical data, is utilized to provide real-time predictions or insights.

ML Inference

Why Nebius AI is the right choice

GPUs for different workloads

Support and onboarding assistance

Affordable and reasonable pricing

How to choose GPU for inference

V100

А100

Н100

H200

Solution architecture

Let’s find the best possible technical solution

Key services

Explore documentation

Getting started

Step-by-step guides

Concepts overview

Marketplace products

Ubuntu 22.04 LTS for NVIDIA^® GPUs (CUDA^® 12)

Ubuntu 22.04 LTS for NVIDIA^® GPUs (CUDA^® 11)

Ubuntu 20.04 LTS for NVIDIA^® GPUs (CUDA^® 12)

Ubuntu 20.04 LTS for NVIDIA^® GPUs (CUDA^® 11)

FAQ and basic terminology

What is machine learning inference?

Why is inference important in machine learning?

Why use GPUs in the cloud for machine learning inference?

What are the top benefits of cloud-based GPU inference?

How does cloud-based GPU inference improve model performance?

Can cloud-based GPU inference handle large datasets?

Ready to get started?

Learn more

Platform

Prices

Solutions

Resources

Company

Why Nebius AI is the right choice

GPUs for different workloads

Support and onboarding assistance

Affordable and reasonable pricing

How to choose GPU for inference

V100

А100

Н100

H200

Solution architecture

Let’s find the best possible technical solution

Key services

Explore documentation

Getting started

Step-by-step guides

Concepts overview

Marketplace products

Ubuntu 22.04 LTS for NVIDIA® GPUs (CUDA® 12)

Ubuntu 22.04 LTS for NVIDIA® GPUs (CUDA® 11)

Ubuntu 20.04 LTS for NVIDIA® GPUs (CUDA® 12)

FAQ and basic terminology

What is machine learning inference?

Why is inference important in machine learning?

Why use GPUs in the cloud for machine learning inference?

What are the top benefits of cloud-based GPU inference?

How does cloud-based GPU inference improve model performance?

Can cloud-based GPU inference handle large datasets?

Ready to get started?

Learn more

Ubuntu 22.04 LTS for NVIDIA^® GPUs (CUDA^® 12)

Ubuntu 22.04 LTS for NVIDIA^® GPUs (CUDA^® 11)

Ubuntu 20.04 LTS for NVIDIA^® GPUs (CUDA^® 12)