I have a Spark job which reads a source table, does a number of map / flatten / reduce operations and then stores the results into a separate table we use for reporting. Currently this job is run manually using the spark-submit script. I want to schedule it to run every night so the results are pre-populated for the start of the day. Do I:
- Set up a cron job to call the
spark-submitscript? - Add scheduling into my job class, so that it is submitted once but performs the actions every night?
- Is there a built-in mechanism in Spark or a separate script that will help me do this?
We are running Spark in Standalone mode.
Any suggestions appreciated!
cronsounds pretty reasonable to me. - maasg