1
votes

Are the any other possibilities to monitor and verify large hadoop distcp, cluster to cluster, hdfs copy jobs other than examining the yarn/mapreduce logs ? (millions of small and large files, runtime estimated: couple of days, changing network speed due to virtualized environment and parallel production usage of cluster)

Using DistCp V2 and Apache Hadoop 2.7.3 on HDP 2.6.1

1

1 Answers

1
votes

Write the copy status to a log with this below argument in your distcp command:

-log <logdir>

Write logs to DistCp keeps logs of each file it attempts to copy as map output. If a map fails, the log output will not be retained if it is re-executed.