0
votes

I'm using oozie's REST API to invoke shell script actions. Any script that includes accessing or using HDFS fails.

I test with this simple script

#!/bin/bash -e
hadoop fs -touchz aaa

When i run the Oozie workflow that invokes this script from command line -- as "root" -- it works fine. When i invoke it from the REST API, it fails with Error Code JA018 and Error Message:

Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]

I am certain that my oozie assets are set up properly (i.e., job.properties and workflow.xml) since if i swap out "touch /tmp/aaa" (i.e., a linux touch command) the workflow runs to successful completion.

First, i can't find clear documentation of what JA018 actually is.

Second (and mainly), I think the cause of my error may be permissions. When the script works (i.e., using linux touch command -- not HDFS touchz), I'm passing the job property user.name as "root" to the Oozie REST server. The Oozie REST server (running as "oozie" per Cloudera configuration) does not run the job as the "user.name" passed to it (else presumably it'd be running the workflow as "root" and it'd work). Is there some way i need to configure oozie ID to be able to run as the user.name I pass to it? Should I use an ID other than root? (I use root precisely because it has all permissions.)

Seems this is a problem others would have, but i've found next to nothing on the web. There is probably some standard way of doing this that i am simply missing.

I am using Cloudera 4.1.3.

thanks for any thoughts / suggestions.

1

1 Answers

0
votes

My approach and assumptions were wrong -- thus the problem. Turns out that Oozie doesn't support this sort of thing because the shell script is being run as a different user to the one that Oozie is doing it's stuff internally in HDFS as. In other words, running the shell script loses the context in which Oozie does operations within HDFS. So, having Oozie invoke a shell script and expecting it to be able to do things in HDFS is the wrong approach.

Oozie is kinda limited in what it can do explicitly in HDFS, but it appears that is the way to invoke HDFS actions within Oozie.