1
votes

I have 2 questions that will help me understand how HDFS works in the context of blocks.

1. You use the hadoop fs -put command to write a 300 MB file using and HDFS block size of 64 MB. Just after this command has finished writing 200 MB of this file, what would another user see when trying to access this file?

A. They would see Hadoop throw an ConcurrentFileAccessException when they try to access this file.

B. They would see the current state of the file, up to the last bit written by the command.

C. They would see the current of the file through the last completed block.

D. They would see no content until the whole file written and closed.

As I see it, because the file is splitted into blocks, when each block is put in the HDFS it becomes available, so my answer is C, but I do need a verification for it...

2. You need to move a file titled “weblogs” into HDFS. When you try to copy the file, you can’t. You know you have ample space on your DataNodes. Which action should you take to relieve this situation and store more files in HDFS?
A. Increase the block size on all current files in HDFS.

B. Increase the block size on your remaining files.

C. Decrease the block size on your remaining files.

D. Increase the amount of memory for the NameNode.

E. Increase the number of disks (or size) for the NameNode.

F. Decrease the block size on all current files in HDFS.

My approach for this one is that the file is probably small enough to fit, but a much larger block will be allocated for it, and so if you decrease the block size it will "defragment" some of the gaps - I can't figure out though, if it is a good approach to do this for the remaining files or all the files... or even if my approach is correct

Thank you!!

2

2 Answers

0
votes
  1. If the writer has not used Hflush, then the reader will see an error as block has not been finalized yet. So I will got with D.

Here are two links for this https://issues.apache.org/jira/browse/HDFS-1907 Hadoop HDFS: Read sequence files that are being written

  1. One of the error in this situation will that the Name node is not aware of the spaces in HDFS. So I wll go with E in this case.

Links: error while copying the files from local file system to HDFS in Hadoop

0
votes

For the first question, see the discussion in another SO question. In that discussion, the answer could be either C or D, depending on what the question is trying to ask. Files are copied block by block, and there is technically a way to see the file being written through the last block, but it's under a file with a different name.

For the second, one approach (answer C) is to have the remaining files fill the gaps in between the blocks of the files that already exist. Your assumption that small files have large blocks allocated is incorrect - files only take up as much space as they need. According to Hadoop: The Definitive Guide

Unlike a filesystem for a single disk, a file in HDFS that is smaller than a single block does not occupy a full block’s worth of underlying storage.