0
votes

The core-site.xml file informs Hadoop daemon where NameNode runs in the cluster. It contains the configuration settings for Hadoop Core such as I/O settings that are common to HDFS and MapReduce.

The hdfs-site.xml file contains the configuration settings for HDFS daemons; the NameNode, the Secondary NameNode, and the DataNodes. Here, we can configure hdfs-site.xml to specify default block replication and permission checking on HDFS. The actual number of replications can also be specified when the file is created. The default is used if replication is not specified in create time.

I'm looking to understand which processes [Namenode, Datanode, HDFS client] need access to which of those configuration files?

  • Namenode: I presume it only needs hdfs-site.xml because it doesn't need to know its own location.
  • Datanode: I presume it needs access to both core-site.xml (to locate the namenode) and hdfs-site.xml (for various settings)?
  • HDFS client: I presume it needs access to both core-site.xml (to locate the namenode) and hdfs-site.xml (for various settings)?

Is that accurate?

2

2 Answers

1
votes

The clients and server processes need access to both files

If you use HDFS nameservices with highly available Namenodes, then the two Namenodes need to find each other

0
votes

Some comments:

  • core-site.xml hdfs-site.xml Are the two used by external
    programs (such as NiFi) to access the cluster/WEB HDFS API
  • Edge nodes require both for cluster access

  • Ambari will manage both of these along with all the others

  • The three you listed all need access in order to run the cluster and at a bare minimum set basic settings such as proxy settings and cluster access