5
votes

I configured and installed hadoop 1.2.1 single node. I configured the namenode and jobtracker address with ports as "hdfs://localhost:9000" and "localhost:9001" respectively.

After starting the cluster (start-all.sh). I ran netstat -nltp after this, which listed the hadoop ports.

50030 - jobtracker Web UI
50060 - tasktracker Web UI
50070 - namenode Web UI
50075 - datanode Web UI
(http://localhost:50075/browseDirectory.jsp?dir=%2F)
50090 - secondary namenode Web UI

54310 - namenode (as configured in XML)
54311 - jobtracker (as configured in XML)
50010 - datanode (for data transfer)
50020 - datanode (for block metadata operations & recovery)
33447 - tasktracker ( not configured. Any unused local port is chosen by hadoop itself)

But, a couple of other ports also were occupied and it shows it is java process (I stopped hadoop and confirmed that these belonged to that hadoop cluster only).

48212 - ???
41888 - ???
47448 - ???
52544 - ???

These are not fixed ports. They are chosen dynamically. Because, when i restarted the cluster (stop-all.sh and start-all.sh), the other ports were same as first time, except these ports changed

48945 - tasktracker (This is fine, as explained before)

What about the other ports? What are these ports used for?

44117 - ???
59446 - ???
52965 - ???
56583 - ???
3

3 Answers

2
votes

On a linux system, known services are typically listed in the /etc/services file. This is where network utilities (eg. netstat) get's the friendly names for port numbers (ie. 80/http).

Certain packages may update /etc/services. If the hadoop ports in question have a dynamic range that changes then there would be no reason to perform this update.

References

http://www.cyberciti.biz/faq/find-out-which-service-listening-specific-port/
http://www.tldp.org/LDP/nag2/x-087-2-appl.services.html

Hope this helps.

1
votes

Thank you for posting this interesting question, Vivek.

It intrigued me a lot and I dig up a bit of code for Apache Hadoop 1.2.1 - startup section for each of the master and slave; But there were no additional port bindings except for the standard documented one.

I did a couple of experiments on ways we can start a namenode and observed the ports using netstat -nltpa

1) hadoop --config ../conf namenode -regular

2) Directly invoking the Namenode main class

3) Add the default core-default.xml and than start up the namenode

My observation was for #2 and #3 only standard ports showed up, so I looked up the java options and that was the bingo.

Comment all of below in hadoop-env.sh and then start hadoop, you will see only standard port, so the additional ports you see are all JMX bin ports

export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS"
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS"
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"

Hope this helps.

0
votes
netstat -nltp

This shows all the active TCP, UDP, RAW, or Unix socket connections. Hadoop HDFS, Hbase, Zookeeper creates lot of sockets intermediately and uses for read/write or messaging.

Number of RPC Reader threads gets created in org.apache.hadoop.hdfs.DFSClient and to read/write data from the connections. hadoop.rpc.socket.factory.class.ClientProtocol will provide details on how to create socket/factory.