1
votes

Failed to place enough replicas: expected size is 1 but only 0 storage types can be selected (replication=1, selected=[], unavailable=[DISK], removed=[DISK], policy=BlockStoragePolicy

{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}

We have a scenario where there are multiple hdfs files being written (order of 500-1000 files - at most 10-40 such files written concurrently) -- we don't call close immediately on each file for every write -- but keep writing till end and then call close.

It seems that sometimes we get the above error - and the write fails. We have set hdfs retries to 10 - but that does not seem to help.

We also increased dfs.datanode.handler.count to 200 - that did sometime helped but not always. a) Would increasing dfs.datanode.handler.count help here? even if 10 are written concurrently.. b) What should be done so that we don't get error at application level -- as such hadoop monitoring page indicates that disks are healthy - but from the warning message, it did seemed that sometimes disks were not available -- org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 1 (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy

{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} , newBlock=true) All required storage types are unavailable: unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy

Assuming that above happens only when we find failures to disks -- we also tried setting dfs.client.block.write.replace-datanode-on-failure.enable to false, so that for temporary failures, we don't get errors. But it does not seem to help either.

Any further suggestions here?

1

1 Answers

0
votes

In my case this was fixed by opening the firewall port 50010 for the datanodes (on Docker)