0
votes

When I attempt to launch a Pig Script Job from the Hue Pig UI the process starts a job and then the progress bar sits there for 1-3 minutes and eventually turns all red, with the only output being: "Please initialize HIVE_HOME."

My Pig Script that I'm attempting to run is the one from the Pig validation in the HDP installation manual.

The Pig Script:
A = load 'passwd' using PigStorage(':');
B = foreach A generate \$0 as id; store B into '/tmp/id.out';

I'm trying to get the Pig Script UI in Hortonworks Data Platform's version of Hue (v2.3 according to their website) working. I'm doing this by hand, installing the packages I need using yum and manually setting the configuration. I'm working on Amazon AWS machines using CentOS 6.4. I am following this installation documentation.

Originally I believed that I needed Oozie up and running in order to execute a Pig Job from Hue, but in following the Hue logs I saw that it was launching WebHCat when executing the Pig Script, thus I installed and validated that WebHCat was working by going to http://$WebHCat.server.full.hostname:50111/templeton/v1/status and verifying that I received a non-errorneous response. Thus I fully believe that WebHCat is working correctly. I must also note that the WebHCat server is being run by the Hue user due to configurations in the core-site.xml that allows the Hue user to pose as the hdfs user when running jobs, this was a previous error that I ran into in attempting to get the Pig Script in the Hue UI running.

So far I have tracked down the error to a code block in the actual Pig program file. The code block in question is:

# CLASSPATH initially contains $PIG_CONF_DIR
CLASSPATH="${PIG_CONF_DIR}"
CLASSPATH=${CLASSPATH}:$JAVA_HOME/lib/tools.jar
if [ "$includeHCatalog" == "true" ]; then
  # need to provide the hcatalog jar file path as well as
  # the location of the hive jars on which hcatalog depends
  hiveMetaStoreJar=hive-metastore-*.jar
  thriftJar=libthrift-*.jar
  hiveExecJar=hive-exec-*.jar
  fbJar=libfb303-*.jar
  jdoECJar=jdo*-api-*.jar
  slfJar=slf4j-api-*.jar
  hbaseHiveJar=hive-hbase-handler-*.jar
  if [ "$HIVE_HOME" == "" ]; then
    if [ -d "/usr/lib/hive" ]; then
      HIVE_HOME=/usr/lib/hive
    else
      echo "Please initialize HIVE_HOME"
      exit -1
    fi
  fi

The fact that this code block is causing issue is extremely confusing and the error is untraceable to myself.

I know that this if-else clause: if [ -d "/usr/lib/hive" ]; then HIVE_HOME=/usr/lib/hive else echo "Please initialize HIVE_HOME" exit -1 fi Is where the error is stemming from, but I don't understand why it's not recognizing that that lib directory where Hive is installed does in fact exist.

This is where I am stuck. I know that "/usr/lib/hive" exists as I have verified that this directory exists on the machine where Hue, WebHCat, and Pig are all running from. I don't know how to debug what's going wrong here. I considered manually setting HIVE_HOME, but that would probably be moot since Pig isn't recognizing that that directory exists in the filesystem and would thus error out if it tried to access the path later in the code.

At this point I might be missing something obvious, but I would greatly appreciate any help I can get.

2

2 Answers

1
votes

I had the same issue but a slightly different error message. That is how I fixed it.

You should already have a "hue" user in /etc/passwd. The home directory for this user should be "/usr/lib/hue". This is the default home directory assigned during installation by Hue.

  1. Create the home directory for "hue" user.

    sudo mkdir /home/hue
    
  2. Change directory permissions

    sudo chown hue:hadoop /home/hue
    
  3. Change home directory in /etc/passwd for user "hue" from "/usr/lib/hue" to "/home/hue"

    hue:x:1013:492:Hue:/home/hue:/bin/bash
    

That's it. It should be fine now.

0
votes

I solved a similar problem by removing the "-useHCatlog" argument from de arguments list under the "Pig script" textarea at HUE's Pig page.