What are the tools/frameworks that I can use for spark jobs monitoring and alerting?

Question

We have few spark batch jobs and streaming jobs. Spark batch jobs are running on Google cloud VM and Spark streaming jobs are running on Google Dataproc cluster. It is becoming difficult to manage the jobs. So we wanted to implement some mechanism to monitor the jobs' health. Our basic requirement is to know :

What time job started and how much time it took for processing the data.
How many records affected.
Send alert if there is any error.
Visualize the above metrics everyday and take action if required.

I am not well versed with spark domain. I explored the stackdriver logging in Google Dataproc but did not find the logs for streaming jobs on dataproc clusters. I know ELK stack can be used but I wanted to know what is the best practices in spark ecosystem for such kind of requirement. Thanks.

Igor Dvorzhak Igor Dvorzhak · Accepted Answer · 2018-05-28T13:33:53

Google Cloud Dataproc writes logs and pushes metrics to Google Stackdriver which you can use for monitoring and alerting.

Take a look at documentation on how to use Dataproc with Stackdriver: https://cloud.google.com/dataproc/docs/guides/stackdriver-monitoring

What are the tools/frameworks that I can use for spark jobs monitoring and alerting?

2 Answers