My question is that I have a text file with 100 words in it separated by space and I need to do a word count program.
So, when my name node splits the file into HDFS blocks, how can we be assured that the splitting is done at the end of the word only?
I.e., if I have my 50th word in the text file as Hadoop, what if while splitting it into 64MB blocks, the storage of the current block might reach 64MB at the centre of the word Hadoop and thus one block contains 'had' and the other 'oop' in some other block.
Sorry if the question might sound silly, but please provide the answer.Thanks .