Below are the sentences from Hadoop Definitive Guide in "Anatomy of a File Write in HDFS". Am not clear, can someone provide more details on it.
If any datanode fails while data is being written to it, then the following actions are taken, which are transparent to the client writing the data. First, the pipeline is closed, and any packets in the ack queue are added to the front of the data queue so that datanodes that are downstream from the failed node will not miss any packets.
Q.) what does this mean "datanodes that are downstream from the failed node will not miss any packets"? can anyone explain more detail.
When the client has finished writing data, it calls close() on the stream. This action flushes all the remaining packets to the datanode pipeline and waits for acknowledgements before contacting the namenode to signal that the file is complete.
Q.) What does "action flushes all the remaining packets to the datanode pipeline"?
Q.) And if client has finished writing data then why does packets still remain and why does it has to flush data nodes.