When I attempt to launch a Pig Script Job from the Hue Pig UI the process starts a job and then the progress bar sits there for 1-3 minutes and eventually turns all red, with the only output being: "Please initialize HIVE_HOME."
My Pig Script that I'm attempting to run is the one from the Pig validation in the HDP installation manual.
The Pig Script:
A = load 'passwd' using PigStorage(':');
B = foreach A generate \$0 as id; store B into '/tmp/id.out';
I'm trying to get the Pig Script UI in Hortonworks Data Platform's version of Hue (v2.3 according to their website) working. I'm doing this by hand, installing the packages I need using yum and manually setting the configuration. I'm working on Amazon AWS machines using CentOS 6.4. I am following this installation documentation.
Originally I believed that I needed Oozie up and running in order to execute a Pig Job from Hue, but in following the Hue logs I saw that it was launching WebHCat when executing the Pig Script, thus I installed and validated that WebHCat was working by going to http://$WebHCat.server.full.hostname:50111/templeton/v1/status and verifying that I received a non-errorneous response. Thus I fully believe that WebHCat is working correctly. I must also note that the WebHCat server is being run by the Hue user due to configurations in the core-site.xml that allows the Hue user to pose as the hdfs user when running jobs, this was a previous error that I ran into in attempting to get the Pig Script in the Hue UI running.
So far I have tracked down the error to a code block in the actual Pig program file. The code block in question is:
# CLASSPATH initially contains $PIG_CONF_DIR
CLASSPATH="${PIG_CONF_DIR}"
CLASSPATH=${CLASSPATH}:$JAVA_HOME/lib/tools.jar
if [ "$includeHCatalog" == "true" ]; then
# need to provide the hcatalog jar file path as well as
# the location of the hive jars on which hcatalog depends
hiveMetaStoreJar=hive-metastore-*.jar
thriftJar=libthrift-*.jar
hiveExecJar=hive-exec-*.jar
fbJar=libfb303-*.jar
jdoECJar=jdo*-api-*.jar
slfJar=slf4j-api-*.jar
hbaseHiveJar=hive-hbase-handler-*.jar
if [ "$HIVE_HOME" == "" ]; then
if [ -d "/usr/lib/hive" ]; then
HIVE_HOME=/usr/lib/hive
else
echo "Please initialize HIVE_HOME"
exit -1
fi
fi
The fact that this code block is causing issue is extremely confusing and the error is untraceable to myself.
I know that this if-else clause: if [ -d "/usr/lib/hive" ]; then HIVE_HOME=/usr/lib/hive else echo "Please initialize HIVE_HOME" exit -1 fi Is where the error is stemming from, but I don't understand why it's not recognizing that that lib directory where Hive is installed does in fact exist.
This is where I am stuck. I know that "/usr/lib/hive" exists as I have verified that this directory exists on the machine where Hue, WebHCat, and Pig are all running from. I don't know how to debug what's going wrong here. I considered manually setting HIVE_HOME, but that would probably be moot since Pig isn't recognizing that that directory exists in the filesystem and would thus error out if it tried to access the path later in the code.
At this point I might be missing something obvious, but I would greatly appreciate any help I can get.