Questions and answers about Managed Service for ClickHouse
General questions
What is Managed Service for ClickHouse?
Managed Service for ClickHouse is a service that helps you create, operate, and scale ClickHouse databases in a cloud infrastructure.
With Managed Service for ClickHouse, you can:
- Create a database with the required performance characteristics.
- Scale processing power and storage dedicated for your databases as needed.
- Get database logs.
Managed Service for ClickHouse takes on time-consuming ClickHouse infrastructure administration tasks:
- Monitors resource usage.
- Automatically creates DB backups.
- Provides fault tolerance through automatic failover to backup replicas.
- Keeps database software updated.
You interact with database clusters in Managed Service for ClickHouse the same way you interact with regular databases in your local infrastructure. This allows you to manage internal database settings to meet your app's requirements.
What is ClickHouse used for? Which database should I select?
ClickHouse is designed primarily for analytics (OLAP) and only supports adding and reading data. You can update data but with limitations
What part of database management and maintenance is Managed Service for ClickHouse responsible for?
When creating clusters, Managed Service for ClickHouse allocates resources, installs the DBMS, and creates databases.
For the created and running databases, Managed Service for ClickHouse automatically creates backups and applies fixes and updates to the DBMS.
Managed Service for ClickHouse also provides data replication between database hosts (both inside and between availability zones) and automatically switches the load over to a backup replica in the event of a failure.
Which tasks should I use Managed Service for ClickHouse for and for which VMs with databases?
Nebius AI offers two ways to work with databases:
- Managed Service for ClickHouse allows you to operate template databases with no need to worry about administration.
- Compute Cloud virtual machines allow you to create and configure your own databases. This approach allows you to use any database management systems, access databases via SSH, etc.
What is a database host and database cluster?
A database host is an isolated database environment in the cloud infrastructure with dedicated computing resources and reserved data storage.
A database cluster is one or more database hosts between which replication can be configured.
How do I get started with Managed Service for ClickHouse?
Managed Service for ClickHouse is available to any registered Nebius AI user.
To create a database cluster in Managed Service for ClickHouse, you must define its characteristics:
- Host class (performance characteristics such as CPUs, memory, and so on).
- Storage size (reserved in full when you create the cluster).
- The network your cluster will be connected to.
- The number of hosts for the cluster and the availability zone for each host.
For detailed instructions, see Getting started with Managed Service for ClickHouse.
How many DB hosts can a cluster contain?
You can create a cluster with as little as one host.
The maximum number of hosts in a cluster is only limited by the requested computing resources and the size of the storage for the cluster.
For more information, see Quotas and limits in Managed Service for ClickHouse.
How can I access a running DB host?
You can connect to Managed Service for ClickHouse databases using standard DBMS methods.
Learn more about connecting to clusters.
How many clusters can I create within a single cloud?
MDB technical and organizational limits are given in Quotas and limits in Managed Service for ClickHouse.
How do I maintain database clusters?
Maintenance in Managed Service for ClickHouse implies:
- Automatic installation of DBMS updates and revisions for DB hosts (including disabled clusters).
- Changes to the host class and storage size.
- Other Managed Service for ClickHouse maintenance activities.
For more information, see Maintenance in Managed Service for ClickHouse.
Which version of ClickHouse does Managed Service for ClickHouse use?
Managed Service for ClickHouse supports ClickHouse 23.8 LTS. For more information, see ClickHouse versioning policy.
Which ClickHouse version should I choose?
We recommend ClickHouse 23.8 LTS. For more information, see ClickHouse versioning policy.
What happens when a new DBMS version is released?
When new minor versions are released, the cluster software is automatically updated after a short testing period.
The owners of the affected DB clusters receive advanced notice of expected work times and DB availability.
What happens when a DBMS version becomes deprecated?
When a DBMS version becomes deprecated, Managed Service for ClickHouse automatically sends email notifications to the owners of database clusters created with this version.
New hosts can no longer be created using deprecated DBMS versions. Clusters on a deprecated version of ClickHouse are updated according to the versioning policy.
The owners of the affected DB clusters receive advanced notice of expected work times and DB availability.
How can I change the computing resources and storage size for a database cluster?
You can change computing resources and storage size in the management console. All you need to do is choose a different host class for the required cluster.
The cluster characteristics change within 30 minutes. During this period, other maintenance activities may also be enabled for the cluster, such as installing updates.
Can I get logs of my operations with services?
Yes, you can request log records about your resources from Nebius AI services. For more information, see Data requests.
Questions about ClickHouse
Why should I use ClickHouse in Managed Service for ClickHouse rather than my own VM-based installation?
Managed Service for ClickHouse automates routine database maintenance:
-
Quick DB deployment with the necessary available resources.
-
Data backup.
-
Regular software updates.
-
Providing DB cluster failover.
-
Database usage monitoring and statistics.
When should I use ClickHouse instead of PostgreSQL?
ClickHouse only supports adding and reading data because it is designed primarily for analytics (OLAP). In other cases, it's probably more convenient to use PostgreSQL.
How do I upload data to ClickHouse?
Use the INSERT
statement described in the ClickHouse documentation
How do I upload very large data to ClickHouse?
Use the CLIINSERT
command per second).
Data transfer from physical media is not yet supported.
What happens to a cluster if one of its nodes fails?
DB clusters consist of at least two replicas, so the cluster will continue working if one of its nodes is out.
Data may be lost only if a node with a non-replicated table
Is it possible to deploy a ClickHouse database cluster in multiple availability zones?
Yes, you can. A database cluster may consist of hosts that reside in different availability zones.
How does replication work for ClickHouse?
Replication is managed by ZooKeeper. For replication to work in a Managed Service for ClickHouse cluster, a ZooKeeper cluster with at least three hosts is created.
Access to ZooKeeper and its setup are not available to Nebius AI users.
Why does a ClickHouse cluster take up 3 hosts more than it should?
When creating a ClickHouse cluster with 2 or more hosts, Managed Service for ClickHouse automatically creates a cluster with 3 ZooKeeper hosts to manage replication and fault tolerance. These hosts are taken into account when calculating the consumed cloud resource quota and cluster cost. By default, ZooKeeper hosts are created with a minimal host class.
For more information about using ZooKeeper, see the ClickHouse documentation
How do I delete data in ClickHouse based on TTL?
Data is deleted based on TTL
Deleting entire data chunks is more efficient and uses less server resources but requires the value of the TTL expression and the partitioning key
Deletions during merge transactions use more resources and are carried out with regular background merge transactions or during unscheduled merges. Merge frequency depends on the value in the merge_with_ttl_timeout
parameter. This parameter is set at table creation
We recommend managing TTL data processing always to delete obsolete data in entire chunks. To do this, set ttl_only_drop_partstrue
when creating tables.
Can I use JSON data for tables in ClickHouse?
Yes, you can. However, JSON is currently an experimental data type in ClickHouse. To allow creating tables of this type, run this query:
SET allow_experimental_object_type=1;
Note
SET
queries are not supported when connecting to a cluster through the management console. To run such a query, use a different cluster connection method, e.g., through clickhouse-client.
Make sure you have the latest client version installed.
For more information, see the ClickHouse documentation
Why is a cluster working slowly even though it still has free computing resources?
Perhaps, the maximum storage IOPS and bandwidth values are insufficient for processing the current number of requests. In this case, throttling is triggered and the performance of the entire cluster degrades.
The maximum IOPS and bandwidth values increase by a fixed value when the storage size increases by a certain step. The step and increment values depend on the disk type:
Disk type | Step, GB | Max IOPS increase (read/write) | Max bandwidth increase (read/write), MB/s |
---|---|---|---|
network-hdd |
256 | 300/300 | 30/30 |
network-ssd |
32 | 1,000/1,000 | 15/15 |
To increase the maximum IOPS and bandwidth values and make throttling less likely, increase the storage size when you update your cluster.
If you are using the network-hdd
storage type, consider switching to network-ssd
by restoring the cluster from a backup.
Connection
Is it possible to connect to individual ClickHouse hosts?
Yes. You can connect to ClickHouse cluster hosts:
-
Using the HTTPS interface
:- Via an encrypted SSL connection on port 8443.
- Without encryption through port 8123.
-
Using the command-line client
:- Via an encrypted SSL connection on port 9440.
- Without encryption through port 9000.
SSH connections are not supported.
Why can't I connect to a host from the internet?
Most likely, no public access is enabled for the cluster, so you can only connect to it from a VM in Nebius AI. You can only request public access when creating a new host in your cluster.
How do I connect to a non-public host in Nebius AI?
Connect to a host from a VM in Nebius AI hosted in the same cloud network, or add a new cluster host with public access and connect to a non-public host through it.
Can I connect to a public cluster without SSL?
No. You can only connect to public hosts using an SSL connection. For more information, see the documentation.
Editing clusters
How do I add a host to a cluster?
To add a host, follow the instructions. You can also add new hosts to a cluster when creating a shard.
Can I set join_use_nulls to 1 using the CLI?
Yes. To do this, when creating a user or updating user settings, pass the desired join_use_nulls
setting value in the --settings
parameter. For example:
ncp managed-clickhouse user update <username> \
--cluster-name=<cluster name> \
--settings="join_use_nulls=1"
For more information, see the documentation.
Is a cluster available when being updated?
If it is a multi-host cluster, there is no downtime while updating it, since the hosts are updated one by one. Only individual hosts are unavailable when the cluster is being restarted.
How do I change the time zone?
Follow the steps described in Changing ClickHouse settings.
Is a cluster unavailable when adding replicas?
Yes, there is a short downtime when restarting a cluster.
How do I grant a user read-only permissions?
To do this, when creating or editing a user via the CLI, pass readonly=1
in the --settings
parameter. For example:
ncp managed-clickhouse user update <username> \
--cluster-name=<cluster name> \
--settings="readonly=1"
For more information, see the documentation.
How do I increase the memory limit?
Update the user settings and set the desired Max memory usage
parameter value.
Can I change the disk type?
No, you can only select the disk type when creating a cluster or restoring it from a backup.
Can I change a network and subnets?
No, you can only select a network and subnets for hosts when creating a cluster or restoring it from a backup.
Cluster configuration
How do I grant a user permissions to create and delete tables or databases?
Go to the cluster settings, enable the Managing users via SQL option, and grant a user the appropriate permissions using a statementGRANT
.
How do I find out the internal_replication setting value?
The internal_replication
setting information is not available in the Nebius AI interfaces or the ClickHouse system tables. The default setting value is true
.
How do I increase the maximum amount of RAM to run a query?
If the amount of RAM is not sufficient for running a query, the following error occurs:
DB::Exception: Memory limit (total) exceeded:
would use 14.10 GiB (attempt to allocate chunk of 4219924 bytes), maximum: 14.10 GiB.
(MEMORY_LIMIT_EXCEEDED), Stack trace (when copying this message, always include the lines below)
To increase the maximum amount of RAM, use the Max memory usage parameter.
If the User management via SQL option is enabled for the cluster, you can set the Max memory usage
parameter:
-
For the current user session by running this query:
SET max_memory_usage = <value in bytes>;
-
For all users by default by creating a settings profile
.
Moving and restoring a cluster
How do I back up a ClickHouse database?
Backups are created every 24 hours and stored for seven days after being created. You can restore data only as of backup creation time.
Is DB host backup enabled by default?
Yes, backup is enabled by default. For ClickHouse, a full backup is performed once a day with the possibility to restore it to any saved backup.
When are backups performed? Is a DB cluster available during backup?
The backup window is an interval during which a full daily backup of the DB cluster is performed. The backup window is from 22:00 to 02:00 (UTC+00:00).
Clusters remain fully accessible during the backup window.
How many backups are stored in Managed Service for ClickHouse? For how long?
The size and amount of backups are not limited. All backups (automatic and manual) are stored for seven days.
What does a daily backup include?
Backup data is stored only for the MergeTree
engine family. For other engines, backups only store table schemas. For more information, see Backups in Managed Service for ClickHouse.
Why does it take a long time to restore a cluster from a backup?
The average speed when recovering a cluster from a backup is about 100 Mbps. Cluster recovery time may vary significantly depending on the host class and the nature of data being stored in the DB.
Monitoring and logs
What metrics and processes can be tracked using monitoring?
For all DBMS types, you can track:
- CPU, memory, network, or disk usage, in absolute terms.
- Memory, network, or disk usage as a percentage of the set limits for the corresponding cluster's host class.
- The amount of data in the DB cluster and the remaining free space in data storage.
For DB hosts, you can track metrics specific to the corresponding type of DBMS. For example, for PostgreSQL, you can track:
- Average query execution time.
- Number of queries per second.
- Number of errors in logs.
Monitoring can be performed with a minimum granularity of 5 seconds.
What is the retention period for logs?
Cluster logs are stored for 30 days.
How do I track the amount of free storage space on ZooKeeper hosts?
Follow the instructions in Monitoring the state of ClickHouse clusters and hosts to track the host state.