2
votes

I've got question about Hadoop Mapreduce and Pig environments. In this thread I've found that Pig Latin code is interpeted by Pig system.

First I thought Pig create .jar file with map and reduce methods and then this file is "send" to Hadoop Mapreduce environment to run a mapreduce job (it's a future work of developers of Pig).

So, when exactly Hadoop Mapreduce is used by Pig System? Is it somewhere during interpretation of Pig Latin code? Or, if I ask my question in another words: what is the output of Pig which is send as the input to Hadoop Mapreduce?

Thanks a lot for your answer.

2

2 Answers

3
votes

The role of MapReduce can be called "execution engine". Pig as a system is translating the Pig Latin commands into one or more MR Jobs. Pig itself does not have capability to run them - it delegate this work to Hadoop.
I would build analogy between compiler and OS. Compiler create program while OS execute it. In this analogy Pig is compiler and Hadoop is OS.
Pig doing a bit more - it run jobs, monitor them etc.. So in additional to being compiler it can be viewed as a "shell".
In best of my understanding Pig is not 100% compiler from the following perspective - it does not compile MR job per command. It pass information about what should be done to the pre-existing jobs (I am 99% but not 100% sure here).

2
votes

Pigs' implementation of operators is using Hadoops' API. So depending upon the configs, the job is executed in local mode or on a hadoop cluster. Pig is NOT passing any output to Hadoop... its sets the input types and data locations for the map-reduce job.

Pig Latin provides a set of standard data-processing operations, such as join, filter, group by, order by, union, etc which are then mapped to map-reduce jobs. A Pig Latin script describes a directed acyclic graph (DAG), where the edges are data flows and the nodes are operators that process the data.