I am using Apache Hadoop-2.7.1 on cluster that consists of three nodes
nn1 master name node
nn2 (second name node)
dn1 (data node)
we know that if we configure high availability in this cluster
we will have two main nodes, one is active and another is standby
and if we configure the cluster to be called by name service too the following scenario will be ok
the scenario is:
1- nn1 is active and nn2 is stand by
so if we want to get file(called myfile) from dn1 we can send this url from browser (webhdfs request)
http://nn1/webhdfs/v1/hadoophome/myfile/?user.name=root&op=OPEN
2- name node daemon in nn1 is killed so according to high availability nn1 is standby and nn2 is active so we can get myfile now by sending this web request to nn2 because it is active now
http://nn2/webhdfs/v1/hadoophome/myfile/?user.name=root&op=OPEN
so configuring name service with high availability is enough for name node failure and for webhdfs to work fine then
so what is the benefit of adding httpfs here because webhdfs with high availibility is not supported and we have to configure httpfs
HttpFs
to get that done. – franklinsijo