0
votes

I am currently running a cluster with 2 nodes. 1 Node is master/slave and the other one is just slave. I have a file and I set the block size to half the size of that file. Then I do

hdfs dfs -put file /

File gets copied to the HDFS no problem, but when I check the HDFS site, I see that both the blocks that was created is in one datanode (the blocks are on the datanode whereI used the -put command). I even tried to call the balancer script but both the blocks are still on the same datanode.

I need the data to be evenly spread out (as much as possible) between all nodes.

Am I missing something here?

2
What says hdfs dfs -ls /file ?jlliagre
File will be a plain text file. I'm not sure if I understand your question.Instinct
You are indeed misunderstanding my question. Let me rephrase it. Can you post the result of the command hdfs dfs -ls /file?jlliagre
Sorry for the late respond, I just got to work. But here is what you requested. bash-4.1$ hdfs dfs -ls /input/data1.txt 15/03/09 08:51:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable -rw-r--r-- 1 blahblah supergroup 390 2015-03-06 16:57 /input/data1.txtInstinct

2 Answers

1
votes

As the hdfs dfs -ls output shows, your replication factor is set to 1, so there is no compelling reason for hdfs to distribute the data blocks on the datanodes.

You need to increase the replication level to at least 2 to get what you expect, eg:

hdfs dfs -setrep 2 /input/data1.txt
0
votes

when we are writing data into hdfs, to save bandwidth utilization and network round trips initial copy is trying to save on the same node where we are executing hadoop put command. As your replication 1 , it happens like that.