I hope we can get advice from the smart people here
we have hadoop cluster and 5 data-nodes machines ( workers machines )
our HDFS size is almost 80T
, and we have 98%
used capacity !!!
from economic side we cant increase the HDFS size , by adding disks to the data-nodes
so we are thinking to decrease the HDFS replication factor from 3 to 2
lets do a simulation ,
if we decrease the hdfs replication factor from 3 to 2 , its means that we have only 2 backup of each data
but the question is - the third data that was create from previous 3 replication factor still exists in HDFS disks
so how HDFS know to delete the third data? or is it something that HDFS know to do?
or maybe - no any option to delete the old data that create because the previews replication factor ?