I am trying to get the insight of map reduce architecture. I am consulting this http://answers.oreilly.com/topic/2141-how-mapreduce-works-with-hadoop/ article. I have some questions regarding the component JobClient of mapreduce framework. My questions is:
How the JObClient Computes the input Splits on the data?
According to the stuff to which i am consulting , Job Client computes input splits on the data located in the input path on the HDFS specified while running the job. the article says then Job Client copies the resources(jars and compued input splits) to the HDFS. Now here is my question, when the input data is in HDFS, why jobClient copies the computed inputsplits into HDFS.
Lets assume that Job Client copies the input splits to the HDFS, Now when the JOb is submitted to the Job Tracker and Job tracker intailize the job why it retrieves input splits from HDFS?
Apologies if my question is not clear. I am a beginner. :)