hadoop jobs in deadlock with pyspark and oozie

Question

I am trying to run pyspark on yarn with oozie, after submitting the workflow, there are 2 jobs in the hadoop job queue, one is the oozie job , which is with the application type "map reduce", and another job triggered by the previous one, with application type "Spark", while the first job is running, the second job remains in 'accepted" status. here comes the problem, while the first job is waiting for the second job to finish to proceed, the second is waiting for the first one to finish to run, I may be stuck in a dead lock, how could I get rid of this trouble, is there anyway the hadoop job with application type "mapreduce" run parallel with other jobs of different application type?

here is the screenshot of hadoop jobs

Any advice is appreciated, thanks!

YoungHobbit YoungHobbit · Accepted Answer · 2017-05-08T16:49:55

Please check the value for property into Yarn scheduler configuration. I guess you need to increase it to something like .9 or so.

Property: yarn.scheduler.capacity.maximum-am-resource-percent

You would need to start Yarn, MapReduce and Oozie after updating the property.

More info: Setting Application Limits.

hadoop jobs in deadlock with pyspark and oozie

1 Answers