Frequently Asked Questions

Why would I use Spark Performance Advisor?

If you run many Spark jobs and incur significant cost, you probably want to know where to focus your optimization efforts, and we can suggests the jobs or teams where to start, and optimization approaches.

We can also detect when suboptimal jobs is added, or when an existing job degrades (maybe due to change in data), so that no performance regression slips unnoticed.

If you have just a few jobs, you can run them once with Spark Advisor to see how well they are or maybe to help you optimize them.

How does it work

We use the Spark listener mechanism. As each stage of your job completes, the statistics is submitted to our service, processed, and can be reviewed.

Our UI can show the time charts, where you can see performance trends and notice if something got broke. You can also review the most problematic recent job execuctions.

How much does it cost?

The service is free and we hope that typical usage will remain free. By “typical usage” we mean common Spark environments and hundreds-to-thousands of job runs per day.

We reserve the right to apply fair use restrictions and we might develop new paid features.

What environments does it support

The product has been tested with

  • Open-source Spark running on Kubernetes
  • Databricks
  • AWS EMR
  • GCP Dataproc

We expect it to work in other setups that use Spark 3.* with Scala 2.12 or 2.13 and any version of Java.

Is it secure?

Yes.

First of all, we never collect your data, only performance metric. You can review the listener source code.

Beyond that, we use reasonable security practices to protect the data

  • The service runs in an isolated environment, include separate VPC, separate Kubernetes clusters, and separate databases.
  • Access to environment is limited to the project team.
  • The source code was reviewed by the security department.

Is it open-source

While the listener implementation is open-source, the rest of the service is not.

We love open-source, and some of us have contributed to open-source for a decade, but we don’t plan to make the service open-source at the moment.