1
votes

I tried to kill a oozie coordinator job like this:

$ oozie job -oozie http://10.0.3.2:11000/oozie -kill 0003288-130913181709024-oozie-oozi-C

There is no any error appears.

But after I check the oozie UI, the job still exists.

I skilled severa jobs , then deployed new code update on oozie , the start new job. Since the old job can not be killed, there are many jobs with same project'RUNNING' on Oozie now.

I can kill the jobs before , but can not kill it now. So, how to force kill the RUNNING job ? Do you know what may made this happen?

Thanks very much.

4
Have you checked the oozie server logs for any error / warning messages? - Chris White

4 Answers

4
votes

I encounter this error sometimes on my testing environment, where Oozie uses Derby as database. The solution is to clean Oozie state by removing the database:

sudo /etc/init.d/oozie stop
sudo rm -rf /var/lib/oozie/oozie-db/
sudo /etc/init.d/oozie start

Of course, this solution may not be appropriate for production system (althrough I've never seen this error in production).

0
votes

Although this has been answered for Derby, we also have seen this behavior when using Oozie with a Postgresql database, so I'm posting our solutions here for anyone else who runs into this issue in the future. The solution ended up being to clone the database schema using pg_dump

pg_dump --schema-only $OOZIE_DB > outfile
createdb -O $OOZIE_USER $NEW_DB_NAME
psql $NEW_DB_NAME < outfile

This gives you a clean instance of the Oozie database. From there, update the Oozie configuration to use to the new (clean) database, and restart the Oozie server.

-1
votes

I have also face this issue,

simply first list the all jobs in oozie and later kill desired job is now very simple -

arif@ubuntu:~/applications/hadoop/oozie-4.3.0$ bin/oozie jobs
Job ID                                   App Name     Status    User      Group     Started                 Ended                   
------------------------------------------------------------------------------------------------------------------------------------
0000000-171229155700312-oozie-arif-W     sqoop-wf     RUNNING   arif      -         2017-12-29 10:55 GMT    -                       
------------------------------------------------------------------------------------------------------------------------------------
0000002-171229093438895-oozie-arif-W     sqoop-wf     FAILED    arif      -         2017-12-29 06:30 GMT    2017-12-29 11:39 GMT    
------------------------------------------------------------------------------------------------------------------------------------
0000001-171229093438895-oozie-arif-W     sqoop-wf     FAILED    arif      -         2017-12-29 06:21 GMT    2017-12-29 06:21 GMT    
------------------------------------------------------------------------------------------------------------------------------------
0000000-171229093438895-oozie-arif-W     sqoop-wf     FAILED    arif      -         2017-12-29 06:13 GMT    2017-12-29 06:13 GMT    
------------------------------------------------------------------------------------------------------------------------------------
arif@ubuntu:~/applications/hadoop/oozie-4.3.0$ bin/oozie jobs -jobtype coordinator
No Jobs match your criteria!
arif@ubuntu:~/applications/hadoop/oozie-4.3.0$ bin/oozie job -kill 0000000-171229155700312-oozie-arif-W
arif@ubuntu:~/applications/hadoop/oozie-4.3.0$ bin/oozie jobs
Job ID                                   App Name     Status    User      Group     Started                 Ended                   
------------------------------------------------------------------------------------------------------------------------------------
0000000-171229155700312-oozie-arif-W     sqoop-wf     KILLED    arif      -         2017-12-29 10:55 GMT    2017-12-29 11:54 GMT    
------------------------------------------------------------------------------------------------------------------------------------
0000002-171229093438895-oozie-arif-W     sqoop-wf     FAILED    arif      -         2017-12-29 06:30 GMT    2017-12-29 11:39 GMT    
------------------------------------------------------------------------------------------------------------------------------------
0000001-171229093438895-oozie-arif-W     sqoop-wf     FAILED    arif      -         2017-12-29 06:21 GMT    2017-12-29 06:21 GMT    
------------------------------------------------------------------------------------------------------------------------------------
0000000-171229093438895-oozie-arif-W     sqoop-wf     FAILED    arif      -         2017-12-29 06:13 GMT    2017-12-29 06:13 GMT    
------------------------------------------------------------------------------------------------------------------------------------

hope it will help someone, thanks.

-1
votes

This happened with my Oozie server as well and it was on testing environment. This was caused when the resourceManager was down and Oozie tried to submit the job to RM.

To overcome this, I removed the entry of this job from my MySQL table (instead of deleting the whole database) and restarted the job:

mysql> delete from WF_JOBS where id="Wf-id";