Hadoop HDFS is not distributing blocks of data evenly

Question

I am currently running a cluster with 2 nodes. 1 Node is master/slave and the other one is just slave. I have a file and I set the block size to half the size of that file. Then I do

hdfs dfs -put file /

File gets copied to the HDFS no problem, but when I check the HDFS site, I see that both the blocks that was created is in one datanode (the blocks are on the datanode whereI used the -put command). I even tried to call the balancer script but both the blocks are still on the same datanode.

I need the data to be evenly spread out (as much as possible) between all nodes.

Am I missing something here?

File will be a plain text file. I'm not sure if I understand your question. — Instinct
You are indeed misunderstanding my question. Let me rephrase it. Can you post the result of the command hdfs dfs -ls /file? — jlliagre
Sorry for the late respond, I just got to work. But here is what you requested. bash-4.1$ hdfs dfs -ls /input/data1.txt 15/03/09 08:51:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable -rw-r--r-- 1 blahblah supergroup 390 2015-03-06 16:57 /input/data1.txt — Instinct

jlliagre jlliagre · Accepted Answer · 2015-03-09T16:56:58

As the hdfs dfs -ls output shows, your replication factor is set to 1, so there is no compelling reason for hdfs to distribute the data blocks on the datanodes.

You need to increase the replication level to at least 2 to get what you expect, eg:

hdfs dfs -setrep 2 /input/data1.txt

Hadoop HDFS is not distributing blocks of data evenly

2 Answers