I'm looking for some specific information regarding the chain of events when running a MapReduce job on a Hadoop cluster.
Let's assume that my Reduce tasks are on the verge of completion. After my last reducer has written its output to the output file, how many replicas of the output file are there? What exactly happens after the last reducer has finished writing to the output file. When does the NameNode request the respective Data Nodes to replicate the output file? And how is the Name Node informed that the output file is ready? Who conveys that information to the NameNode?
Thank you!