Marketplace

Kubeflow

Updated May 21, 2024

Kubeflow is an open-source platform dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable, and scalable. It provides an ML stack for Kubernetes, consisting of TensorFlow, Jupyter notebooks, and other ML tools. With Kubeflow, data scientists and engineers can build, train, and deploy machine learning models in a consistent and reproducible manner across different environments. Kubeflow enables you to run end-to-end ML workflows on Kubernetes, from data preparation to model training and serving, using the same infrastructure and tools. It simplifies the deployment process by abstracting away the complexities of managing Kubernetes resources, allowing you to focus on developing and iterating ML models.

You can deploy Kubeflow in your Nebius AI Managed Service for Kubernetes clusters using this Marketplace product. The product deploys Kubeflow through deployKF.

Warning

Before installing Kubeflow, you must install NVIDIA® GPU Operator and Argo CD on the cluster. For details, see the deployment instructions below.

Deployment instructions
  1. If you want to use InfiniBand networking for GPUs in your Kubernetes cluster, create a GPU cluster.

  2. Create a Kubernetes cluster and a node group with GPUs in it. If you have created a GPU cluster, select it when creating the node group.

  3. Install kubectl and configure it to work with the created cluster.

  4. Install the required Marketplace products on the cluster in the following order:

    1. NVIDIA® GPU Operator

    2. NVIDIA® Network Operator (if you use a GPU cluster for InfiniBand networking)

    3. Argo CD with the following requirements:

      • During the installation the application and its namespace must be called argocd
      • You must not change the administrator’s password
  5. Create an Object Storage bucket as a repository for Kubeflow Pipelines artifacts (a pipeline root).

  6. Configure access to the created Object Storage bucket:

    1. Create a service account and add it to the editors group.

    2. Create a static access key for the service account.

To install the product:

  1. Click the button in this card to go to the cluster selection form.

  2. Select your cluster and click Continue.

  3. Configure the application:

    • Namespace: Select a namespace or create one.

    • Application name: Enter an application name.

    • Storage bucket name: Enter the Object Storage bucket name created previously.

    • S3 access key: Paste the ID of the static access key created previously.

    • S3 secret key: Paste the contents of the static access key created previously.

    • Kubeflow hostname: Enter the name of a domain that deployKF, part of this product, will use. Default: deploykf.example.com

    • Kubeflow admin password: Create and enter a password for the Kubeflow admin. The administrator can manage profiles and contributors of other users. For more details on multi-user isolation, see the Kubeflow documentation.

    • Kubeflow user1 password: Enter a password for the Kubeflow user1.

  4. Click Install.

  5. Wait for the application to change its status to Deployed. This might take some time. You can check on the progress in the Argo CD UI; for instructions on how to access it, see the Argo CD deployment instructions.

  6. To check that Kubeflow is working, access the deployKF dashboard. To do this, you need to modify the hosts file on your machine and configure port forwarding. You can log into the dashboard as either admin@example.com or user1@example.com using their respective passwords that you entered previously. For more details, see the deployKF documentation.

Users and profiles

After installing Kubeflow, you can create users and profiles.

Creating users

In this product, users are managed by Dex.

  1. Get the current Dex configuration:

    
    kubectl get secret dex-config -n deploykf-auth -o jsonpath="{.data.config\.yaml}" \
    
      | base64 --decode > config.yaml
    
    
  2. Modify the resulting config.yaml file to add users. For details, see the example configuration in Dex’s GitHub repository.

  3. Apply the modified configuration:

    
     kubectl create secret generic dex-config --from-file=config.yaml \
    
       --namespace=deploykf-auth --dry-run=client -o yaml \
    
       | kubectl apply -f -
    
    
Creating profiles
  1. Create a file with profiles configuration and apply it according to the Kubeflow documentation.

  2. Set up the new profiles to use the existing credentials (the static access key) for Kubeflow Pipelines:

    1. Get the secret that contains the credentials:

      
      kubectl get secret cloned--kubeflow-pipelines--backend-object-store-auth \
      
        -n kubeflow -o yaml \
      
        > secret.yaml
      
      
    2. In the resulting secret.yaml file, replace the namespace name with the name of a profile.

    3. Add the modified secret to the cluster:

      
      kubectl apply -f secret.yaml
      
      
    4. Repeat steps 2.2–2.3 for each created profile.

Uninstalling

Warning

Do not uninstall Kubeflow while it is being installed. Your cluster may become corrupted as a result.

To uninstall Kubeflow, uninstall the products in the following order:

  1. Kubeflow

  2. Argo CD

  3. NVIDIA® Network Operator (if installed)

  4. NVIDIA® GPU Operator

Some custom resource definitions (CRDs) may still remain in your cluster after uninstalling Kubeflow.

Billing type
Free
Type
Kubernetes® Application
Category
Training
Inference
LLM apps framework
Publisher
Nebius
Use cases
  • Automated ML model deployment and scaling on Kubernetes clusters.
  • Serving ML models at scale in production environments.
  • Building and deploying custom machine learning pipelines for various applications.
  • Implementing continuous integration and continuous deployment (CI/CD) workflows for ML models.
  • Integrating with existing Kubernetes-based infrastructure for seamless deployment and management of ML workloads.
  • Developing and deploying machine learning applications for natural language processing (NLP), computer vision, and other domains.
  • Orchestrating distributed training of deep learning models across multiple GPUs or TPUs in Kubernetes clusters.
  • Implementing federated learning workflows for privacy-preserving model training across distributed data sources.
  • Integrating with monitoring and observability tools for tracking and analyzing ML model performance and resource utilization.
Technical support

Nebius AI does not provide technical support for the product. If you have any issues, please refer to the developer’s information resources.

Product composition
Helm chartVersion
Pull-command
Documentation
cr.nemax.nebius.cloud/yc-marketplace/nebius/kubeflow/chart/kubeflow0.1.0Open
Docker imageVersion
Pull-command
cr.nemax.nebius.cloud/yc-marketplace/nebius/kubeflow/kubeflow-installer17138771534680233853247181164250189765137067707271.0
Terms
By using this product you agree to the Nebius AI Marketplace Terms of ServiceApache 2.0
Billing type
Free
Type
Kubernetes® Application
Category
Training
Inference
LLM apps framework
Publisher
Nebius