1
votes

I have a cron job which gets called every two minutes The purpose of this job is to check for new request and trigger a spark job. Cron job --> calls a shell script --> calls spark-submit

source /etc/hadoop/conf/hadoop-env.sh
source /etc/spark/conf/spark-env.sh
spark-submit  --executor-memory 2g --num-executors 1 --packages com.databricks:spark-csv_2.10:1.5.0 \
                                  --py-files <some egg files location>  \
                                  <python main script> \
                                  <configuration file> <Input Parameters>

When manually trigered , the script works fine but when triggered through cron it goes into a dead lock trying to get spark context.

Any body anyone have pointers for me on this ?

1
use http://airbnb.io/projects/airflow/, cron is unreliable - elcomendante
how did you configure the cron job ? - Mohamed Ali JAMAOUI
@MedAli : i use crontab -e , and then put in the statement */2 * * * * sh /path/script.sh - Garfield
@KarolSudol : Thanks for the comment, i will definitely look at it , but as of now i wont be in a situation to introduce a new component. - Garfield

1 Answers

0
votes

The issue was with kerbeos . The link has the answer Click Here