HDFS storage support compression format to store compressed file. I know that gzip compression doesn't support splinting. Imagine now the file is a gzip-compressed file whose compressed size is 1 GB. Now my question is:
- How this file will get stored in HDFS (Block size is 64MB)
From this link I came to know that The gzip format uses DEFLATE to store the compressed data, and DEFLATE stores data as a series of compressed blocks.
But I couldn't understand it completely and looking for broad explanation.
More doubts from gzip compressed file:
- How many block will be there for this 1GB gzip compressed file.
- Will it go on multiple datanode ?
- How replication factor will be applicable for this file ( Hadoop cluster replication factor is 3.)
- What is
DEFLATE
algorithm? - Which algorithm is applied while reading the gzip compressed file?
I am looking here broad and detailed explanation.