With default Hadoop settings, how long it would take to write a 64Mb file into HDFS assuming it takes 4 minutes to write a block.
According to me with 64Mb as default block size, the client has to write a single block which should take 4 * 3[replication factor]=12 minutes.
Reason
HDFS uses pipelining to achieve its replicated writes. When the client receives the list of DataNodes from the NameNode, the client streams the block data to the first
DataNode (4 minutes), which in turn mirrors the data to the next DataNode (4 minutes), and so on until the data has reached all of the DataNodes (4 minutes again). Acknowledgements from the DataNodes are also pipelined in reverse order.
4+4+4=12 minutes
Can some one confirm if my understanding is correct ?