2
votes

I am using Hadoop to processing on large set of data. I set up a hadoop node to use multiple volumes : one of these volume is a NAS with 10To disk, and the other one is the local disk from server with a storage capacity of 400 GB.
The problem is, if I understood, that data-nodes will attempt to place equal amount of data in each volumes. Thus when I run a job on a large set of data the disk with 400 GB is quickly full, while the 10 To disk got enough space remained. Then my map-reduce program produce by Hive freeze because my cluster turn on the safe mode...
I tried to set the property for limit Data node's disk usage, but it does nothing : I have still the same problem. Hope that someone could help me.

Well it seems that my mapreduce program turn on safe mode because :

The ratio of reported blocks 0.0000 has not reached the threshold 0.9990.

I saw that error on the namenode web interface. I want to disable this option with the property dfs.safemode.threshold.pct but I do not know if it is a good way to solve it?

2
I'm not sure if i understand your question correct, but i've got the impression you didn't understand HDFS. In my opinion your servers shouldn't share nothing. The local disc storage is used by the datanode but the NAS shouldn't be used. An other very important questions is: How many datanodes do you have?khmarbaise
I have four datanodes. I am using the NAS because the output from my programs is too big for the local disk. I expected that Hadoop can deal with dfs.data.dir devices of different sizes.C. Oran
I am using the Hadoop-0.20.203.0 and Hive-0.7.1.C. Oran

2 Answers

0
votes

I think you can turn to dfs.datanode.fsdataset.volume.choosing.policy for help.

<property><name>dfs.datanode.fsdataset.volume.choosing.policy</name><value>org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy</value>

0
votes

Use the dfs.datanode.du.reserved configuration setting in $HADOOP_HOME/conf/hdfs-site.xml for limiting disk usage.

Reference

<property> 
    <name>dfs.datanode.du.reserved</name> 
    <!-- cluster variant --> 
    <value>182400</value> 
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use. 
  </description> 
  </property>