Could somebody shed some light on what happens if the Flume agent gets killed in the middle of the HDFS file write (say using Avro format)? Will the file get corrupted and all events there lost?
I understand that there are transactions between different elements of the Flume data chain (source->channel->sink). But I believe that the HDFS files may stay open between consecutive channel->sink transactions (as .tmp). So if one transaction of say 100 events is successful (and the events are stored in a file, transaction committed) and the next one fails in the middle of the HDFS write could it be that the original 100 events from the first transaction are not readable (because the file corruption for instance?). How come Flume can assure that the original 100 events from the first transaction are not affected by this type of failure? Or maybe there is no guarantee there?