Marketplace

Kubernetes Operator for Apache Spark™

Updated January 17, 2024

Apache Spark unifies the processing of your data in batches and real-time streaming, using your preferred language: Python, SQL, Scala, Java or R. It executes fast, distributed ANSI SQL queries for dashboards and ad-hoc reporting faster than most data warehouses. Users can perform Exploratory Data Analysis (EDA) on petabyte-scale data without having to resort to downsampling and train machine learning algorithms on a laptop, using the same code to scale to fault-tolerant clusters of thousands of machines.

The Kubernetes Operator for Apache Spark, developed by Google Cloud, uses the Kubernetes operator pattern and custom resources to handle Apache Spark applications the same way as other Kubernetes workloads.

You can deploy the Kubernetes Operator for Apache Spark in your Nebius AI Managed Service for Kubernetes clusters using this Marketplace product.

Deployment instructions
  1. If you haven’t done it yet, create a Kubernetes cluster and a node group in it.

  2. Install kubectl and configure it to work with the created cluster.

  3. Click the button in this card to go to the cluster selection form.

  4. Select the folder and the cluster, and click Continue.

  5. Configure the application:

    • Namespace: Select a namespace for Apache Spark applications or create a new one.
    • Application name: Enter the application name.
  6. Click Install.

  7. Wait for the application to change its status to Deployed.

  8. To check that the Kubernetes Operator for Apache Spark is working:

    • Deploy a sample Apache Spark application:

      Command
      kubectl apply -n <namespace> -f - <<EOF
      apiVersion: "sparkoperator.k8s.io/v1beta2"
      kind: SparkApplication
      metadata:
        name: spark-pi
      spec:
        type: Python
        pythonVersion: "3"
        mode: cluster
        image: "spark:3.5.0-scala2.12-java17-python3-ubuntu"
        imagePullPolicy: Always
        mainApplicationFile: "local:///opt/spark/examples/src/main/python/pi.py"
        sparkVersion: "3.5.0"
        restartPolicy:
          type: Never
        volumes:
          - name: "test-volume"
            hostPath:
              path: "/tmp"
              type: Directory
        driver:
          cores: 1
          coreLimit: "1200m"
          memory: "512m"
          labels:
            version: 3.5.0
          serviceAccount: spark
          volumeMounts:
            - name: "test-volume"
              mountPath: "/tmp"
        executor:
          cores: 1
          instances: 1
          memory: "512m"
          labels:
            version: 3.5.0
          volumeMounts:
            - name: "test-volume"
              mountPath: "/tmp"
      EOF
      
    • Check that a SparkApplication named spark-pi is created:

      kubectl get sparkapplications spark-pi -n <namespace> -o=yaml
      
    • Get spark-pi's logs:

      kubectl describe sparkapplication spark-pi -n <namespace>
      
Billing type
Free
Type
Kubernetes® Application
Category
Machine Learning & AI
Developer tools
Publisher
Nebius
Use cases
  • Running Apache Spark applications in a Kubernetes cluster.
  • Using cron for scheduled application runs.
  • Customizing Apache Spark pods.
  • Configuring automatic policies for applications.
  • Collecting and exporting Apache Spark metrics.
Technical support

Nebius AI does not provide technical support for the product. If you have any issues, please refer to the developer’s information resources.

Product composition
Helm chartVersion
Pull-command
Documentation
cr.nemax.nebius.cloud/yc-marketplace/nebius/spark-operator/chart/spark-operator1.1.27Open
Docker imageVersion
Pull-command
cr.nemax.nebius.cloud/yc-marketplace/nebius/spark-operator/spark-operator1705507674214086237342862133349223436070191493478v1beta2-1.3.8-3.1.1
Terms
By using this product you agree to the Nebius AI Marketplace Terms of ServiceApache 2.0
Billing type
Free
Type
Kubernetes® Application
Category
Machine Learning & AI
Developer tools
Publisher
Nebius