1
votes

I have just configured a clone hadoop version 2.7.3, I load my data sizes from 1 g up to 20 go and I use this data (can manipulate them ...) but when I restart the cluster this data does not Will not be accecible. I will have this message: WARNING : There are about xx missing blocks. Please check the log or run fsck, it means that some blocks in your HDFS installation do not have a single replica on any of the live DataNodes. here is the hdfs-site.xml:

<configuration>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///home/hduser/hadoop-2.7.3/namenode</value>
        <description>NameNode directory for namespace and transaction logs storage.</description>
    </property>
   <property>
    <name>dfs.safemode.threshold.pct</name>
    <value>0</value>
   </property>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.datanode.use.datanode.hostname</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
        <value>false</value>
    </property>
    <property>
         <name>dfs.namenode.http-address</name>
         <value>node1:50070</value>
         <description>Your NameNode hostname for http access.</description>
    </property>
    <property>
         <name>dfs.namenode.secondary.http-address</name>
         <value>node1:50090</value>
         <description>Your Secondary NameNode hostname for http access.</description>
    </property>
</configuration>
1
You are running with no replication. That means that if any DataNode goes down, there will be corrupt files (missing blocks). Is this a test config? Are you running multiple DataNodes? - jeff
yes i just use for multiple dataNodes and i just set the replication parametres to 3 and it does not work , and whene i restart my cluster this data its not accessible. - inoubli
You may also want to configure the safemode.threshold to something higher (default is 0.99). My guess is during the restart, the DataNodes have not checked in with NameNode and since you have safemode disabled, you're getting missing block errors. - jeff
i thank you for your suggestion, you can give me an value to set in safe mode parameter - inoubli
I'd try the default (0.99) - jeff

1 Answers

0
votes

The default value of the property dfs.datanode.data.dir is ${hadoop.tmp.dir}/dfs/data and hadoop.tmp.dir is /tmp which gets cleaned up on reboot and thus all your blocks are lost.

You have to add this property to the hdfs-site.xml in all the datanodes,

<property>
    <name>dfs.datanode.data.dir</name>
    <value>file:///home/hduser/hadoop-2.7.3/datanode</value>
</property>