0
votes

Im not sure If Im calculating it right but for example im using Hadoop default settings and I want to calculate how much data i can store in my cluster. For example I have 12 nodes and 8 TB total disk space per node allocated to HDFS storage.

Do I just calculate 12/8 = 1.5 TB?

1

1 Answers

2
votes

You're not including the replication factor and overhead for processing any of that data. Plus, Hadoop won't run if all the disks are close to full

Therefore, 8 TB would be first divided by 3 (without the new Erasure Coding enabled), and then by the number of nodes

However, you can't technically hit 100% of HDFS usage because the services will start failing once you start going above 85% usage, so really, your starting number should be 7TB