I have 2 questions that will help me understand how HDFS works in the context of blocks.
1. You use the hadoop fs -put command to write a 300 MB file using and HDFS block size of 64 MB. Just after this command has finished writing 200 MB of this file, what would another user see when trying to access this file?
A. They would see Hadoop throw an ConcurrentFileAccessException when they try to access this file.
B. They would see the current state of the file, up to the last bit written by the command.
C. They would see the current of the file through the last completed block.
D. They would see no content until the whole file written and closed.
As I see it, because the file is splitted into blocks, when each block is put in the HDFS it becomes available, so my answer is C, but I do need a verification for it...
2. You need to move a file titled “weblogs” into HDFS. When you try to copy the file, you can’t. You know you have ample space on your DataNodes. Which action should you take to relieve this situation and store more files in HDFS?
A. Increase the block size on all current files in HDFS.
B. Increase the block size on your remaining files.
C. Decrease the block size on your remaining files.
D. Increase the amount of memory for the NameNode.
E. Increase the number of disks (or size) for the NameNode.
F. Decrease the block size on all current files in HDFS.
My approach for this one is that the file is probably small enough to fit, but a much larger block will be allocated for it, and so if you decrease the block size it will "defragment" some of the gaps - I can't figure out though, if it is a good approach to do this for the remaining files or all the files... or even if my approach is correct
Thank you!!