Which processes need access to core-site.xml and hdfs-site.xml

Question

The core-site.xml file informs Hadoop daemon where NameNode runs in the cluster. It contains the configuration settings for Hadoop Core such as I/O settings that are common to HDFS and MapReduce.

The hdfs-site.xml file contains the configuration settings for HDFS daemons; the NameNode, the Secondary NameNode, and the DataNodes. Here, we can configure hdfs-site.xml to specify default block replication and permission checking on HDFS. The actual number of replications can also be specified when the file is created. The default is used if replication is not specified in create time.

I'm looking to understand which processes [Namenode, Datanode, HDFS client] need access to which of those configuration files?

Namenode: I presume it only needs hdfs-site.xml because it doesn't need to know its own location.
Datanode: I presume it needs access to both core-site.xml (to locate the namenode) and hdfs-site.xml (for various settings)?
HDFS client: I presume it needs access to both core-site.xml (to locate the namenode) and hdfs-site.xml (for various settings)?

Is that accurate?

OneCricketeer OneCricketeer · Accepted Answer · 2018-07-30T18:30:07

The clients and server processes need access to both files

If you use HDFS nameservices with highly available Namenodes, then the two Namenodes need to find each other

Which processes need access to core-site.xml and hdfs-site.xml

2 Answers