I am using Hadoop to processing on large set of data. I set up a hadoop node to use multiple volumes : one of these volume is a NAS with 10To disk, and the other one is the local disk from server with a storage capacity of 400 GB.
The problem is, if I understood, that data-nodes will attempt to place equal amount of data in each volumes. Thus when I run a job on a large set of data the disk with 400 GB is quickly full, while the 10 To disk got enough space remained. Then my map-reduce program produce by Hive freeze because my cluster turn on the safe mode...
I tried to set the property for limit Data node's disk usage, but it does nothing : I have still the same problem.
Hope that someone could help me.
Well it seems that my mapreduce program turn on safe mode because :
The ratio of reported blocks 0.0000 has not reached the threshold 0.9990.
I saw that error on the namenode web interface. I want to disable this option with the property dfs.safemode.threshold.pct but I do not know if it is a good way to solve it?