8
votes

I am learning hadoop and bit confused about the default ports and the locations.

When I hit the URL: localhost:50070 gives a result for the hdfs info. In hadoop docs following are some of the ports mentioned.

hdfs-default.xml

dfs.datanode.http.address   0.0.0.0:50075 
dfs.datanode.address     0.0.0.0:50010
dfs.namenode.http-address    0.0.0.0:50070
dfs.namenode.backup.http-address    0.0.0.0:50105

mapred-default.xml

mapreduce.jobtracker.http.address   0.0.0.0:50030
mapreduce.tasktracker.http.address  0.0.0.0:50060

yarn-default.xml

yarn.resourcemanager.address     ${yarn.resourcemanager.hostname}:8032
yarn.resourcemanager.webapp.address  ${yarn.resourcemanager.hostname}:8088

Now while configuring Hadoop 2 in my machine I did : $ cd /usr/local/hadoop/etc/hadoop $ vi core-site.xml

<property>
   <name>fs.default.name</name>
   <value>hdfs://localhost:9000</value>
</property>

Question: There are so many ports mentioned in the default and other xml in the docs ....

1) localhost:50070 only returns some meaning full data (hdfs health) what about the other ports. Others just dont return any information ?

2) in yarn-default.xml both are resource manager ports difference is one is webapp port. only the when I hit localhost:8088 in browser it gives the cluster(single node in this case ) information. Then what is the port 8083? In a sample code I see 8083 is the RM port. Not clear to me. Can someone please explain

3)I changed the hdfs port to 9000 is that standard?

4)How to see the appmaster, jobtracker, tasktracker ports

5) I thought in yarn hadoop 2 there is no jobtracker and tasktracker then what are the purpose of these ports ?

I am having nightmare with these basic questions...

Thanks, Amit

4

4 Answers

2
votes

Hadoop provides Web UI's to have a peek into the hadoop cluster. They help in understanding the status of cluster, job details(running, failed), etc through browser. This is a great relief as we don't want to remember all the commands for these and try from terminal. You have already pointed out some of important ports needed for these(those are default ports and you can change those by playing in configuration files).

Now I will answer your questions one by one. I assume hadoop is in pseudo mode viewing at the core-site.xml.

1) localhost:50070 only returns some meaning full data (hdfs health) what about the other ports. Others just don't return any information ?

I will explain it with details provided by you to avoid confusion.

The rest of the ports are also used for connecting from browser like localhost:50075 for viewing datanode details, localhost:8088 for viewing the currently running jobs, completed ones and so on. Properties which do not have details like http-address, webapp.address are used for Inter Process Communication(IPC). Examples for those ports are 8032, 50010 etc.

2) in yarn-default.xml both are resource manager ports difference is one is webapp port. only the when I hit localhost:8088 in browser it gives the cluster(single node in this case ) information. Then what is the port 8083? In a sample code I see 8083 is the RM port. Not clear to me. Can someone please explain

I hope I have cleared this doubt in the above answer.

3)I changed the hdfs port to 9000 is that standard?

The default port number is 8020. You can keep any. But I don't know whether setting it to 9000 is a standard. I have seen it in some of vendor provided hadoop other than apache.

4)How to see the appmaster, jobtracker, tasktracker ports

I actually couldn't understand your question. If the one you intend to ask was about webui, we have already covered it in the answer of question 1.

5) I thought in yarn hadoop 2 there is no jobtracker and tasktracker then what are the purpose of these ports ?

As of my understanding YARN is a layer that came in between mapreduce and hadoop for the better management of resources and jobs. So it means that jobtracker and tasktracker processes are still present(in background) and used by resource manager and nodemanager processes when required.

Someone can correct me if I went wrong somewhere.

Thanks and regards, Bibin

0
votes

Amit, there are lots of ports being monitored by hadoop, plus there are lots of hadoop daemons. Each monitors specific ports (which you can override) for specific reasons. The documentation lists the ports and their purposes. For example, YARN, the resource manager in Hadoop 2, has a port that it monitors for job submission, yarn.resourcemanager.address. You can override that port (don't) in conf/yarn-site.xml. It also uses a port, yarn.resourcemanager.webapp.address, for its user interface, and another, yarn.resourcemanager.admin.address, for administrative commands. Likewise, HDFS monitors 50070 as its web address, and returns info about the file system. In general, its a good idea to leave the ports alone, since people learn the well-known port numbers and expect them (you wouldn't expect the default port telnet or ftp to move). Remember, yarn and MR can operate on the same cluster. Some distributions have both.

0
votes

In using Hadoop 2.6.5, the three main ports you are looking for are:

  • 8088 Cluster Metrics
  • 50070 HDFS/datanode health
  • 19888 History Server

In your Vagrantfile, open these three ports for port forwarding:

config.vm.network "forwarded_port", guest: 8088, host: 8088
config.vm.network "forwarded_port", guest: 19888, host: 19888
config.vm.network "forwarded_port", guest: 50070, host: 50070

The other ports are internal service to service ports and should not require any modification. You will need to issue the command: vagrant reload --provision to activate these ports.

In addition you will need to modify the value "localhost" in hadoop-2.6.5/etc/hadoop/yarn-site.xml to say 0.0.0.0 as opposed to localhost in order to make portforwarding 8088 work. Make sure your dfs,yarn, and historyservices have started too.

0
votes

for Hadoop 3 it changed to:

HDFS/datanode health : http://localhost:9870/