When exactly Pig uses Hadoop MapReduce environment?

Question

I've got question about Hadoop Mapreduce and Pig environments. In this thread I've found that Pig Latin code is interpeted by Pig system.

First I thought Pig create .jar file with map and reduce methods and then this file is "send" to Hadoop Mapreduce environment to run a mapreduce job (it's a future work of developers of Pig).

So, when exactly Hadoop Mapreduce is used by Pig System? Is it somewhere during interpretation of Pig Latin code? Or, if I ask my question in another words: what is the output of Pig which is send as the input to Hadoop Mapreduce?

Thanks a lot for your answer.

David Gruzman David Gruzman · Accepted Answer · 2012-08-30T10:55:42

The role of MapReduce can be called "execution engine". Pig as a system is translating the Pig Latin commands into one or more MR Jobs. Pig itself does not have capability to run them - it delegate this work to Hadoop.
I would build analogy between compiler and OS. Compiler create program while OS execute it. In this analogy Pig is compiler and Hadoop is OS.
Pig doing a bit more - it run jobs, monitor them etc.. So in additional to being compiler it can be viewed as a "shell".
In best of my understanding Pig is not 100% compiler from the following perspective - it does not compile MR job per command. It pass information about what should be done to the pre-existing jobs (I am 99% but not 100% sure here).

When exactly Pig uses Hadoop MapReduce environment?

2 Answers