0
votes

I tried to make a spark cluster with yarn. Do I need to install yarn to all nodes? I know yarn will ship all need jar and MR but to take them, I presume that worker nodes need yarn package. master node should be installed with HDFS and jar as well as yarn.

2

2 Answers

1
votes

YARN requires two processes

  1. Resource Manager
  2. Node Manager

Resource Manager is the master which delegates tasks. Node Manager is the slave which works on the given piece of work. You have to install Resource Manager on one machine(Production grade server) and Node Manager on all the slave machines(commodity hardware).

HDFS requires the following processes

  1. NameNode
  2. DataNode
  3. SecondaryName

NameNode and Secondary NameNode are to be installed in two separate machines(Production grade servers) and DataNode on all slave machines(commodity hardware).

Typically DataNode and NodeManager would be installed together on all the slave nodes.

0
votes

Not sure, what you are trying to do exactly.

Since hadoop 2.0 yarn is an integral part of hadoop. So, if you install hadoop, yarn automatically gets installed.

When you use provided scripts to start hadoop, they will start hadoop stack. Then you can use provided script to start yarn.

Then you install spark, and point it to use libraries from hadoop installation, and configuration.

There is no need to get into messy detail of installing yarn manually.