1
votes

My 1st question, I'll try not to screw up too bad :)

I am installing Hadoop 2.9.0 on a 4 nodes cluster, for learning purpose. I have started with namenode installation/configuration following the official Apache Hadoop 2.9.0 documentation and some google pages.

I edited my hdfs-site.xml located under $HADOOP_HOME/etc/hadoop directory like so :

  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///apps/hdfs/namenode/data</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:///apps/hdfs/datanode/data</value>
  </property>
  <property>
    <name>dfs.namenode.checkpoint.dir</name>
    <value>file:///apps/hdfs/namesecondary/data</value>
  </property>

When I run the "hadoop namenode -format" it format the default $hadoop.tmp.dir under /tmp/hadoop-hadoop/...

Found some pages that said to set the HADOOP_CONF_DIR to where the XML configurations files are (ie : $HADOOP_HOME/etc/hadoop) but also some that said the opposite, to not set it.

In my case, it did fix my problem but not sure if it's the right modification?

If anyone could help me out to understand this, it would be great :)

Thank alot!

1

1 Answers

0
votes

It's not really clear what problem you were having, but there's a default setting for HADOOP_CONF_DIR which is set in a hadoop-env.sh file that's loaded prior to all hadoop commands.

I believe this might be simply /etc/hadoop. At least, that's where most clusters store the configs

Since your files are elsewhere, you needed to export that value to something else, which is fine.

Personally, I suggest use Apache Ambari instead of manual managing and installing more than 2 nodes. It'll ensure your configs are matching and additionally monitor the services