0
votes

A cluster which runs mapreduce 2 doesn't have a job tracker and instead it is split into two separate components, resource manager and job manager. However, these thing are transparent from a user and he doesn't need to know whether the cluster is running mapreduce 1 or 2 when submitting a mapreduce job.

The thing I cannot quite understand is Yarn application. How is it different from a regular mapreduce application? What's the advantage of running a mapreduce job as a yarn application, etc? Could someone shed some light on that for me?

1

1 Answers

0
votes

MR1 has Job tracker and task tracker which takes care of Map reduce application.

In MR2 Apache separated the management of the map/reduce process from the cluster's resource management by using YARN. YARN is a better resource manger than we had in MR1. It also enables versatility. MR2 is built on top of YARN.

Apart from Map reduce, we can run applications like spark, storm, Hbase, Tex etc on top of Yarn, which we cannot do using MR1.

The following is the architecture for MR1 and MR2.

HDFS <---> MR

HDFS <----> Yarn <----> MR