I have a cron job which gets called every two minutes The purpose of this job is to check for new request and trigger a spark job. Cron job --> calls a shell script --> calls spark-submit
source /etc/hadoop/conf/hadoop-env.sh
source /etc/spark/conf/spark-env.sh
spark-submit --executor-memory 2g --num-executors 1 --packages com.databricks:spark-csv_2.10:1.5.0 \
--py-files <some egg files location> \
<python main script> \
<configuration file> <Input Parameters>
When manually trigered , the script works fine but when triggered through cron it goes into a dead lock trying to get spark context.
Any body anyone have pointers for me on this ?
http://airbnb.io/projects/airflow/,cronis unreliable - elcomendante