0
votes

Here I am assuming that I have a single cluster of 4 nodes and I have a data volume of 500GB. Then in Hadoop1 with default block size(64Mb) how will the data blocks will be assigned to the node also I'm assuming the replication factor as 3.

My understanding: If I have 200Mb data then in Hadoop1 with default block size(64Mb) the data is split into 4 blocks 64+64+64+8 and in the four nodes all four blocks will be present and replicas.

I have added a picture to show my understanding. If my understanding is correct then how will it work for 500Mb data if not help me understand. My understanding of HDFS

1

1 Answers

0
votes

Yes, your understanding is correct. The default block size in HDFS is 64Mb for version 1.x and 128Mb for 2.x. If block is not completed it's spored like it is. But you can configure the size if you need that.

enter image description here

The default replication factor is 3, but it's also possible to change in configuration, so if you have rack awareness configured blocks are replicated:

  • One block is placed on a some node
  • Second block is placed on the same rack as the first one
  • Third block is placed on different rack

enter image description here

For more details you can check this article