In the latest version of most Hadoop distributions, the HDFS usage reports seem to report on space without accounting for the replication factor, correct?
When one looks at the Namenode Web UI and/or runs the 'hadoop dfsadmin -report' command, one can see a report that looks something like this:
Configured Capacity: 247699161084 (230.69 GB)
Present Capacity: 233972113408 (217.9 GB)
DFS Remaining: 162082414592 (150.95 GB)
DFS Used: 71889698816 (66.95 GB)
DFS Used%: 30.73%
Under replicated blocks: 40
Blocks with corrupt replicas: 6
Missing blocks: 0
Based on the machine sizes of this cluster, it seems that this report does NOT account for triple replication... I.E. If I place a file on the HDFS, I should account for the triple replication myself.
For example, if I placed a 50GB file on the HDFS, would my HDFS be dangerously close to full (since it seems that file would be replicated 3 times, using up the 150GB that currently remain)?