What is hadoop (single and multi) nodes, spark-master and spark-worker?

Question

I want to understand the following terms:

hadoop (single-node and multi-node) spark master spark worker namenode datanode

What I understood so far is spark master is the job executor and handles all the spark workers. Whereas hadoop is the hdfs (where our data resides) and from where spark workers reads data according to the job given to them. Please correct me if I wrong.

I also want to understand the role of namenode and datanode. Though I know the role of namenode (having the metadata info of all datanodes and it should be only one preferably, but could be two) and datanodes could be multiple and having the data.

Are datanodes the same hadoop nodes?

Nishu Tayal Nishu Tayal · Accepted Answer · 2016-05-05T08:04:56

SPARK Architecture:

Spark uses a master/worker architecture. There is a driver that talks to a single coordinator called master that manages workers in which executors run.

The driver and the executors run in their own Java processes. You can run them all on the same (horizontal cluster) or separate machines (vertical cluster) or in a mixed machine configuration.

Node are nothing but the physical machines.

Hadoop NameNode and DataNode:

HDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients. In addition, there are a number of DataNodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files. Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes. The NameNode executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for serving read and write requests from the file system’s clients. The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode.

Yeah, DataNodes are the slave node in Hadoop cluster.

Please refer the documentation for more details.

What is hadoop (single and multi) nodes, spark-master and spark-worker?

2 Answers