1
votes

I just started learning Hadoop and PIG (from last two days!) for one of my future project.

For experiments I've installed Hadoop (HDFS on default localhost:9000) as pseudo distributed mode and PIG (map-reduce mode).

When I initialized PIG by typing ./bin/pig command it launched GRUNT command line and I got message that pig connected with HDFS (localhost:9000), later I could successfully able to access HDFS thru pig.

I was expecting to perform some manual configuration for PIG to access HDFS (as per various internet articles).

My question is, from where PIG identified default HDFS configuration (localhost:9000)? I checked pig.properties but I didn't find anything there. I need this info as I might change default HDFS configuration in future.

BTW, I have HADOOP_HOME and PIG_HOME defined in my OS PATH variable.

2
Its very easy to find ...why don't you open pig(shell file) where can see how it is setting the paths of hadoop variables.Rajendra Jangir

2 Answers

1
votes

When installing Pig (I assume v0.10.0) you have to tell how it will connect to the HDFS. I don't know how you did this but generally this is done by adding the hadoop conf dir path to the PIG_CLASSPATH environment variable. You can also set HADOOP_CONF_DIR as well.

If you are starting the grunt shell Pig will locate the directory of the Hadoop configuration XMLs, and takes the value of fs.default.name (core-site.xml) and mapred.job.tracker (mapred-site.xml) , i.e: the location of the Namenode and JobTracker.

For reference you may have a look at the Pig shell script to see how env. variables are collected and evaluated.

0
votes

PIG can connects to underlying HDFS in the 3 ways

1- Pig uses HADOOP_HOME for finding the HADOOP client to Run. your HADOOP_HOME should have been already setup in your bash_profile export HADOOP_HOME=~/myHadoop/hadoop-2.5.2

2- or else there might be possibility that your HADOOP_CONF_DIR has already been setup which contains the xml file for the hadoop configuration export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop/

3-And if these are not setup you can also connect to underlying hdfs by changing the pig.properties which is present under PIG_HOME/conf dir