0
votes

I know data uploaded into hdfs are replicated across datanodes in a hadoop cluster as blocks. My question is what happens when the capacity of all datanodes in the cluster put together is insufficient? e.g. I have 3 datanodes each with a 10GB data capacity (30GB altogether) and I want to insert a data of size 60GB into the hdfs on the same cluster. I don't see how the 60GB data can be split into blocks (~64MB typically) to be accommodated by the datanodes?

Thanks

2

2 Answers

1
votes

I haven't tested it, but it should fail with an out of storage message. As each block of data is written into HDFS, it goes through the replication factor process. Your upload would get about half way through and then die.

That being said, you could potentially gzip the data (high compression) before the upload and potentially squeeze it in, depending on how compressible the data is.

0
votes

I got this issue when I was trying to move a large file from local fs to hdfs, it was stuck in middle and responded the java error out of space and cancel the move/copy command and deleted all the blocks of file which were already copied to hdfs.

So that means we can't copy a single file greater than the capacity of hdfs size of the cluster.