How to fetch value from Hive in shell script via an oozie workflow?

Question

I have a shell script example.sh

hive -e "select max(id) from dummy.table;" > data.txt
hdfs dfs -put -f data.txt /user/username/data.txt

This script fetches data from hive and stores the result in hdfs. This is working as expected in terminal. But when I create an oozie workflow, the file created is empty. I tried printing some hardcoded value then the workflow runs fine. The problem is that when hive query is involved the data is absent though the job is successful. I tried running the same thing with hql and it was working successfully.

insert overwrite directory '/user/username/hiveData' select max(id) from dummy.table;

But my requirement is such that I have to get hive data in my shell script.

What does your workflow.xml look like? I.e. are you simply calling a shell action in it? — mazaneicha
Yes, it is a very simple one. Just calling a shell script with hive -e commands. @mazaneicha — katu_98
I guess It should be done as a single shell command, not two separated — leftjoin

Sayon M Sayon M · Accepted Answer · 2020-07-16T06:25:53

Since you do not have a check on $? you may not know that it is failing. So first step is to add that in your shell script. Or else you wont know whether the shell script failed (because of hive failure) and Oozie will get a successful run status of the shell script.

So without knowing what the real reason for failure of the hive code, I am making a guess.

If you have kerberos authentication, used by Hive, the hive query may be failing inside the shell script called by Oozie. To resolve the kerberos issue, you may need to do something like this:

if [ -z ${HADOOP_TOKEN_FILE_LOCATION} ]
then
    hive -e "select max(id) from dummy.table;" > data.txt
else
    hive -e "SET mapreduce.job.credentials.binary=$HADOOP_TOKEN_FILE_LOCATION; select max(id) from dummy.table;" > data.txt
fi

You can read more about this here

How to fetch value from Hive in shell script via an oozie workflow?

1 Answers