The most common cause of local-dirs are bad
is due to available disk space on the node exceeding yarn's max-disk-utilization-per-disk-percentage
default value of 90.0%
.
Either clean up the disk that the unhealthy node is running on, or increase the threshold in yarn-site.xml
<property>
<name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
<value>98.5</value>
</property>
Avoid disabling disk check, because your jobs may failed when the disk eventually run out of space, or if there are permission issues. Refer to the yarn-site.xml Disk Checker section for more details.
FSCK
If you suspect there is filesystem error on the directory, you can check by running
hdfs fsck /tmp/hadoop-hduser/nm-local-dir