Fast, affordable AI inference at scale by Nebius AI Studio

Use hosted open-source models and achieve faster and more accurate inference results than with proprietary APIs. Benefit from our competitive pricing and lightning fast time to first token in Europe.

Save 3x on input tokens

You only pay for what you use, ensuring you meet your budget goals, making Inference Service perfect for RAG and contextual scenarios.

Achieve ultra-low latency

Our highly optimized serving pipeline guarantees a fast time to first token in Europe, where our data center is located, and beyond.

Verified model quality

We perform a set of tests to ensure high accuracy with a diverse range of open-source models.

Choose speed or economy

We offer you a choice between fast flavor for quicker results at a higher cost, or base flavor for slower but more economical processing.

No MLOps experience required

Benefit from simplicity with our production-ready infrastructure that’s already set up and ready to use.

Get expert support

Reach out to our professional services team if you need help with building a custom solution tailored to your business needs.

Benchmark-backed performance and cost efficiency

time to first token in Europe than competitors

than GPT-4o with comparable quality on Llama-405B

input tokens price for Meta-Llama-405B

Top open-source models available

Meta
Llama-3.1-8B-instruct
A small yet powerful language model with results better than GPT-3.5 and many larger models.

128k context

LLama 3.1 License

Meta
Llama-3.1-405B-instruct
The largest and most powerful open model, comparable to GPT-4 and Claude 3.5 Sonnet.

128k context

LLama 3.1 License

Mistral
Mistral-Nemo-Instruct-2407
Outperforming larger models of its generation, this model shows the potential of compact architectures.

128k context

Apache 2.0 License

Mistral
Mixtral-8x22B-Instruct-v0.1
A Mixture-of-Experts (MoE) model ready for coding and math. One of its focuses is multilingual capabilities.

65k context

Apache 2.0 License

Ai2
OLMo-7B-Instruct
A fully open-source model with all training data and processes published.

2k context

Apache 2.0 License

Microsoft
Phi-3-mini-4k-instruct
Trained on synthetic and high-quality web-sourced data, this model shows strength in reasoning and long context.

4k context

MIT License

DeepSeek
DeepSeek-Coder-V2-Lite-Instruct
A lightweight and fast version of the most powerful model for coding questions.

128k context

DeepSeek license

Nebius
More to come
We are constantly working on adding new models.

A simple and friendly UI for a smooth user experience

Sign up and start testing, comparing and running AI models in your applications.

Full screen image

Familiar API at your fingertips

import openai
import os

client = openai.OpenAI(
    api_key=os.environ.get("NEBIUS_API_KEY"),
    base_url='https://api.studio.nebius.ai/v1'
)

completion = client.chat.completions.create(
    messages=[{
        'role': 'user',
        'content': 'What is the answer to all questions?'
    }],
    model='meta-llama/Meta-Llama-3.1-8B-Instruct-fast'
)

Optimize costs with our flexible pricing

Playground

Playground is an easy way to try out AI models available in Nebius AI Studio without writing any code. Receive up to 1 million tokens* in welcome credit when you sign up to try our product through the Playground, or to spend on your inference workloads through the API.

Two flavors

Choose between fast and base flavors to suit your project needs. Fast flavor delivers quicker results for time-sensitive tasks, while base flavor offers economical processing for larger workloads.

Check out available models and prices

Model
Flavor
Input token (1M)
Output token (1M)
llama-3.1-8b-instruct
fast
$0.13
$0.4
base
$0.04
$0.12
llama-3.1-70B-instruct
fast
$0.6
$1.8
base
$0.4
$1.2
llama-3.1-405b-instruct
fast
-
-
base
$2.5
$7.5
mistral-nemo-instruct-2407
fast
$0.16
$0.48
base
$0.08
$0.24
mixtral-8x7B-instruct-v0.1
fast
$0.4
$1.2
base
$0.17
$0.5
mixtral-8x22b-instruct-v0.1
fast
$0.8
$2.4
base
$0.43
$1.3
OLMo-7B-Instruct
fast
-
-
base
$0.08
$0.24
phi-3-mini-4k-instruct
fast
$0.13
$0.4
base
$0.04
$0.12
deepseek-coder-v2-lite-instruct
fast
$0.4
$1.2
base
$0.2
$0.6

Q&A about Inference Service

Can I use your service for large production workloads?

Absolutely, our service is designed specifically for large production workloads.

Welcome to Nebius AI Studio

Nebius AI Studio is a new product from Nebius designed to help foundation model users and app builders simplify the process of creating applications using these models. Our first release, Inference Service, provides endpoints for the most popular AI models.

* — The amount of the welcome credit depends on the specific model and flavor used.