Managing ML models for inference in Managed Service for ClickHouse
Managed Service for ClickHouse allows you to analyze data by applying CatBoostmodelEvaluate()
Before adding a model
Managed Service for ClickHouse only works with readable models uploaded to Object Storage:
- Upload the trained model file to Object Storage.
- Get a link to the model file.
Getting a list of models in a cluster
- In the management console
, go to the folder page and select Managed Service for ClickHouse. - Click the cluster name and select the Inference tab in the left-hand panel.
If you don't have the Nebius AI command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To get a list of models in a cluster, run the command:
ncp managed-clickhouse ml-model list --cluster-name=<cluster name>
The cluster name can be requested with a list of clusters in the folder.
Getting detailed information about a model
- In the management console
, go to the folder page and select Managed Service for ClickHouse. - Click the cluster name and select the Inference tab in the left-hand panel.
If you don't have the Nebius AI command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To get model details, run this command:
ncp managed-clickhouse ml-model get <model name> \
--cluster-name=<cluster name>
You can request the model name with a list of cluster models and the cluster name with a list of clusters in the folder.
Adding a model
Note
The only supported model type is CatBoost: ML_MODEL_TYPE_CATBOOST
.
-
Select the cluster:
- In the management console
, go to the folder page and select Managed Service for ClickHouse. - Click the cluster name and select the Inference tab in the left-hand panel.
- Click Inference.
- In the management console
-
Configure the model parameters:
- Type:
ML_MODEL_TYPE_CATBOOST
. - Name: Model name. Model name is one of the arguments of the
modelEvaluate()
function, which is used to call the model in ClickHouse. - URL: Model address in Object Storage.
- Type:
-
Click Add and wait for the model to be added.
If you don't have the Nebius AI command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To add a model to a cluster, run the command:
ncp managed-clickhouse ml-model create <model name> \
--cluster-name=<cluster name> \
--type=ML_MODEL_TYPE_CATBOOST \
--uri=<link to model file in Object Storage>
The cluster name can be requested with a list of clusters in the folder.
Applying a model
To apply the model to data stored in a ClickHouse cluster:
-
Connect to the cluster using the client ClickHouse CLI.
-
Execute an SQL query like:
SELECT modelEvaluate('<model name>', <name of column 1>, <name of column 2>, ... <name of column N>) FROM <table name>
Specify the model name and the names of the columns with input data as the modelEvaluate()
function arguments. The query results in a column with model predictions for each row of the source table.
Updating a model
Managed Service for ClickHouse doesn't track changes in the model file located in the Object Storage bucket.
To update the contents of a model that is already connected to the cluster:
- Upload the file with the current model to Object Storage.
- Get a link to this file.
- Change the parameters of the model connected to Managed Service for ClickHouse by providing a new link to the model file.
- In the management console
, go to the folder page and select Managed Service for ClickHouse. - Click the cluster name and select the Inference tab in the left-hand panel.
- Select the appropriate model, click
, and select Edit.
If you don't have the Nebius AI command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To change the link to the model file in the Object Storage bucket, run the command:
ncp managed-clickhouse ml-model update <model name> \
--cluster-name=<cluster name> \
--uri=<new link to file in Object Storage>
You can request the model name with a list of cluster models and the cluster name with a list of clusters in the folder.
Disabling a model
Note
After disabling a model, the corresponding object is kept in the Object Storage bucket. If you no longer need this model object, you can delete it.
- In the management console
, go to the folder page and select Managed Service for ClickHouse. - Click the cluster name and select the Inference tab in the left-hand panel.
- Select the appropriate model, click
, and select Delete.
If you don't have the Nebius AI command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To disable a model, run the command:
ncp managed-clickhouse ml-model delete <model name> \
--cluster-name=<cluster name>
You can request the model name with a list of cluster models and the cluster name with a list of clusters in the folder.
Example
If you don't have a suitable data set or model to process it, you can test model inference in Managed Service for ClickHouse using this example. We prepared a data file for it and trained a model to analyze it. You can upload data to ClickHouse and see model predictions for different rows of the table.
Note
In this example, we'll use public data from the Amazon Employee Access ChallengeACTION
column. The same data and model are used on GitHub
To upload data to ClickHouse and test the model:
-
Install the ClickHouse CLI
and configure your cluster connection as described in the documentation. -
Download the file
with data to analyze:wget https://storage.ai.nebius.cloud/doc-files/managed-clickhouse/train.csv
-
Create a table for the data:
clickhouse-client --host <host FQDN> \ --database <DB name> --secure \ --user <DB username> \ --password <DB user password> \ --port 9440 \ -q 'CREATE TABLE ml_test_table (date Date MATERIALIZED today(), ACTION UInt8, RESOURCE UInt32, MGR_ID UInt32, ROLE_ROLLUP_1 UInt32, ROLE_ROLLUP_2 UInt32, ROLE_DEPTNAME UInt32, ROLE_TITLE UInt32, ROLE_FAMILY_DESC UInt32, ROLE_FAMILY UInt32, ROLE_CODE UInt32) ENGINE = MergeTree() PARTITION BY date ORDER BY date'
-
Upload the data to the table:
clickhouse-client --host <host FQDN> \ --database <DB name> --secure \ --user <DB username> \ --password <DB user password> \ --port 9440 \ -q 'INSERT INTO ml_test_table FORMAT CSVWithNames' \ < train.csv
-
In the management console
, add the test model:- Type:
ML_MODEL_TYPE_CATBOOST
- Name:
ml_test
- URL:
https://storage.ai.nebius.cloud/managed-clickhouse/catboost_model.bin
- Type:
-
Test the model:
- Connect to the cluster using the client ClickHouse CLI.
- Test the model using queries:
-
Predicted values for first 10 rows of the
ACTION
column:SELECT modelEvaluate('ml_test', RESOURCE, MGR_ID, ROLE_ROLLUP_1, ROLE_ROLLUP_2, ROLE_DEPTNAME, ROLE_TITLE, ROLE_FAMILY_DESC, ROLE_FAMILY, ROLE_CODE) > 0 AS prediction, ACTION AS target FROM ml_test_table LIMIT 10
-
Predicted probability for the first 10 rows of the table:
SELECT modelEvaluate('ml_test', RESOURCE, MGR_ID, ROLE_ROLLUP_1, ROLE_ROLLUP_2, ROLE_DEPTNAME, ROLE_TITLE, ROLE_FAMILY_DESC, ROLE_FAMILY, ROLE_CODE) AS prediction, 1. / (1 + exp(-prediction)) AS probability, ACTION AS target FROM ml_test_table LIMIT 10
-