3
votes

My Pig script works fine on its own, until I put it in an Oozie workflow, where I receive the following error:

ERROR 2043: Unexpected error during execution.

org.apache.pig.backend.executionengine.ExecException: ERROR 2043: Unexpected error during execution.
...
Caused by: java.io.IOException: No FileSystem for scheme: hbase

I registered the HBase and Zookeeper jars successfully, but received the same error.

I also attempted to set the Zookeeper Quorum by adding variation of these lines in the Pig script:

SET hbase.zookeeper.quorum 'vm-myhost-001,vm-myhost-002,vm-myhost-003'

Some searching on the internet instructed me to add this to the beginning of my workflow.xml:

SET mapreduce.fileoutputcommitter.marksuccessfuljobs false

This solved the problem. I was even able to remove the registration of the HBase and Zookeeper jars and the Zookeeper quorum.

Now after double checking, I noticed that my jobs actually do their job: they store the results in HBase as expected. But, Oozie claims that a failure occurred, when it didn't.

I don't think that setting the mapreduce.fileoutputcommitter.marksuccessfuljobs to false constitutes a solution.

Are there any other solutions?

1
I had the same issue when writing to Cassandra. Problem is that Oozie by default tries to create a _SUCCESS file after finishing the job. So when you disable this the job will work but if anything afterwards relies on this _SUCCESS file being produced Oozie will mark the job as failed. In my case I made sure that the writing to Cassandra is isolated in its own workflow action. No idea though how this translates to HBase...LiMuBei

1 Answers

0
votes

It seems that there is currently no real solution for this.

However, this answer to a different question seems to indicate that the best workaround is to create the success flag 'manually'.