How to know the best block size for hadoop HDFS? For example, if I have files with a fixed size of 100MB each minute, what would be the ideal block size of HDFS for storage? 64MB? Should I consider the velocity to store this file be smaller than 1 minute? How could I calculate? And which replication factor is the best to use in this case for instance? 2 or 3?
1 Answers
0
votes
which replication factor is the best to use in this case for instance? 2 or 3?
Depends how durable your disks / datacenter is.
How to know the best block size for hadoop HDFS?
The best size is one that is a factor of the largest possible file. It doesn't need to be a factor of 2
Should I consider the velocity to store this file be smaller than 1 minute?
I would suggest you look at NiFi or Streamsets to pre-aggregate and compress the data before writing many many 100MB files every minute. Also, if that is actually 100MB of plaintext, then at least convert to Avro or Parquet first with Snappy compression