Apache Spark is now available as K8s operator in Marketplace

We’ve just made an important addition to our Marketplace, one that will assist you at the data processing stage of the machine learning pipeline.

Apache Spark unifies the processing of your data in batches and real-time streaming, using your preferred language: Python, SQL, Scala, Java, or R. It executes fast, distributed ANSI SQL queries for dashboards and ad-hoc reporting faster than most data warehouses. Users can perform Exploratory Data Analysis (EDA) on petabyte-scale data without having to resort to downsampling and train machine learning algorithms on a laptop, using the same code to scale to fault-tolerant clusters of thousands of machines.

The Kubernetes Operator for Apache Spark, developed by Google Cloud, handles Spark applications the same way as other K8s workloads. You can now deploy the operator on your Managed Kubernetes clusters in Nebius infrastructure. It is available for free in the new Kubernetes Apps category, along with other handy tools.

ML flow stages simplified by Spark

author
Anna Simakov
Product & Process Manager
Sign in to save this post