Java applications are executed in the Hadoop cluster as map-reduce job with a single Mapper task. If a java mapreduce job(not hive or any other job just a direct mapreduce job) is a part of oozie we get a single mapper launcher and actual mapreduce job runs independently. So is there a way to link the launcher and the actual mapreduce job run? like get the jobid of the actual action running with launcher jobid? any command to know?
3 Answers
You can go the oozie UI and get this information. Click on the action which you want and go to Child Job URLs tab. There you can find all the child jobs launched by the particular action.
java action in oozie without child url:
map-reduce action in oozie with child urls tab:
For the map-rdeuce jobs, you can visit the Child Job URLs tab and get all the child mapreduce job urls.
The ideal way is to use the oozie client java api. The API would help you to get the workflow ID and then from that you could get the external id which is the actual hadoop job id. Refer this and this .
Alternate approach is to use the oozie clients web services api. This returns a json for the entire job details for a particular workflow. Then you could add a json parser to extract the externalID out of it to get the actual hadoop job id. Refer this for the existing webservice urls.
We can get the launcher id for any child id from the logs link that can be obtained from
http://<rm httpaddress:port>/ws/v1/history/mapreduce/jobs/<jobid>/jobattempts
There we get an xml which contains the logs link. If we parse through the syslog in that link we have a string like
Service: job_
Use this regular expression and find out the launcher id. If there is a launcher then we can get it from here.(Even for java actions in oozie workflow) The actual line will be something like this
INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: mapreduce.job, Service: <jobid>
The jobid after the Service: is launcher job id

