Kubeflow is an open-source platform dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable, and scalable. It provides an ML stack for Kubernetes, consisting of TensorFlow, Jupyter notebooks, and other ML tools. With Kubeflow, data scientists and engineers can build, train, and deploy machine learning models in a consistent and reproducible manner across different environments. Kubeflow enables you to run end-to-end ML workflows on Kubernetes, from data preparation to model training and serving, using the same infrastructure and tools. It simplifies the deployment process by abstracting away the complexities of managing Kubernetes resources, allowing you to focus on developing and iterating ML models.
You can deploy Kubeflow in your Nebius AI Managed Service for Kubernetes clusters using this Marketplace product. The product deploys Kubeflow through deployKF.
Warning
Before installing Kubeflow, you must install NVIDIA® GPU Operator and Argo CD on the cluster. For details, see the deployment instructions below.
-
If you want to use InfiniBand networking for GPUs in your Kubernetes cluster, create a GPU cluster.
-
Create a Kubernetes cluster and a node group with GPUs in it. If you have created a GPU cluster, select it when creating the node group.
-
Install kubectl and configure it to work with the created cluster.
-
Install the required Marketplace products on the cluster in the following order:
-
NVIDIA® Network Operator (if you use a GPU cluster for InfiniBand networking)
-
Argo CD with the following requirements:
- During the installation the application and its namespace must be called
argocd
- You must not change the administrator’s password
- During the installation the application and its namespace must be called
-
Create an Object Storage bucket as a repository for Kubeflow Pipelines artifacts (a pipeline root).
-
Configure access to the created Object Storage bucket:
-
Create a static access key for the service account.
To install the product:
-
Click the button in this card to go to the cluster selection form.
-
Select your cluster and click Continue.
-
Configure the application:
-
Namespace: Select a namespace or create one.
-
Application name: Enter an application name.
-
Storage bucket name: Enter the Object Storage bucket name created previously.
-
S3 access key: Paste the ID of the static access key created previously.
-
S3 secret key: Paste the contents of the static access key created previously.
-
Kubeflow hostname: Enter the name of a domain that deployKF, part of this product, will use. Default:
deploykf.example.com
-
Kubeflow admin password: Create and enter a password for the Kubeflow
admin
. The administrator can manage profiles and contributors of other users. For more details on multi-user isolation, see the Kubeflow documentation. -
Kubeflow user1 password: Enter a password for the Kubeflow
user1
.
-
-
Click Install.
-
Wait for the application to change its status to
Deployed
. This might take some time. You can check on the progress in the Argo CD UI; for instructions on how to access it, see the Argo CD deployment instructions. -
To check that Kubeflow is working, access the deployKF dashboard. To do this, you need to modify the hosts file on your machine and configure port forwarding. You can log into the dashboard as either
admin@example.com
oruser1@example.com
using their respective passwords that you entered previously. For more details, see the deployKF documentation.
Users and profiles
After installing Kubeflow, you can create users and profiles.
In this product, users are managed by Dex.
-
Get the current Dex configuration:
kubectl get secret dex-config -n deploykf-auth -o jsonpath="{.data.config\.yaml}" \ | base64 --decode > config.yaml
-
Modify the resulting
config.yaml
file to add users. For details, see the example configuration in Dex’s GitHub repository. -
Apply the modified configuration:
kubectl create secret generic dex-config --from-file=config.yaml \ --namespace=deploykf-auth --dry-run=client -o yaml \ | kubectl apply -f -
-
Create a file with profiles configuration and apply it according to the Kubeflow documentation.
-
Set up the new profiles to use the existing credentials (the static access key) for Kubeflow Pipelines:
-
Get the secret that contains the credentials:
kubectl get secret cloned--kubeflow-pipelines--backend-object-store-auth \ -n kubeflow -o yaml \ > secret.yaml
-
In the resulting
secret.yaml
file, replace the namespace name with the name of a profile. -
Add the modified secret to the cluster:
kubectl apply -f secret.yaml
-
Repeat steps 2.2–2.3 for each created profile.
-
Uninstalling
Warning
Do not uninstall Kubeflow while it is being installed. Your cluster may become corrupted as a result.
To uninstall Kubeflow, uninstall the products in the following order:
-
Kubeflow
-
Argo CD
-
NVIDIA® Network Operator (if installed)
-
NVIDIA® GPU Operator
Some custom resource definitions (CRDs) may still remain in your cluster after uninstalling Kubeflow.
- Automated ML model deployment and scaling on Kubernetes clusters.
- Serving ML models at scale in production environments.
- Building and deploying custom machine learning pipelines for various applications.
- Implementing continuous integration and continuous deployment (CI/CD) workflows for ML models.
- Integrating with existing Kubernetes-based infrastructure for seamless deployment and management of ML workloads.
- Developing and deploying machine learning applications for natural language processing (NLP), computer vision, and other domains.
- Orchestrating distributed training of deep learning models across multiple GPUs or TPUs in Kubernetes clusters.
- Implementing federated learning workflows for privacy-preserving model training across distributed data sources.
- Integrating with monitoring and observability tools for tracking and analyzing ML model performance and resource utilization.
Nebius AI does not provide technical support for the product. If you have any issues, please refer to the developer’s information resources.
Helm chart | Version | Pull-command | Documentation |
---|---|---|---|
cr.nemax.nebius.cloud/yc-marketplace/nebius/kubeflow/chart/kubeflow | 0.1.0 | Open |
Docker image | Version | Pull-command |
---|---|---|
cr.nemax.nebius.cloud/yc-marketplace/nebius/kubeflow/kubeflow-installer1713877153468023385324718116425018976513706770727 | 1.0 |