1
votes

I want to know the communication protocol specifically port number used by Namenode and Datanode in hadoop.

Say, if I write the following command in Namenode,

hdfs dfsadmin -report

it will show the details of live nodes (namenode & datanode), how many datanodes are there etc. My question is how namenode and datanode communicates ? via which port? I am actually getting only 1 datanode with the above command whereas in my cluster, there are 8 datanodes. So, I am not sure whether any port blocking of networking is caused this!! My firewall is disabled in the namenode and all the datanodes. I have checked this via sudo ufw status command which returned inactive.

From hadoop official pages (link), I have found this:

The Communication Protocols

All HDFS communication protocols are layered on top of the TCP/IP protocol. A client establishes a connection to a configurable TCP port on the NameNode machine. It talks the ClientProtocol with the NameNode. The DataNodes talk to the NameNode using the DataNode Protocol. A Remote Procedure Call (RPC) abstraction wraps both the Client Protocol and the DataNode Protocol. By design, the NameNode never initiates any RPCs. Instead, it only responds to RPC requests issued by DataNodes or clients.

I am using hadoop 3.1.1 in Ubuntu 16.04

Any help is highly appreciated. Thanks.

1

1 Answers

2
votes

These are all configured in hdfs-site.xml.

For example, by default, dfs.datanode.address=0.0.0.0:9866

If you search for port or address, then you can generally find what you are looking for https://hadoop.apache.org/docs/r3.1.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

If that command or the NameNode UI don't show datanodes, then SSH to the individual nodes, check jps to see if process is running, and log files to find if the process is not running.