0
votes

When hdfs is not available, is there an approach to make sure the data security? The scenario is: kafka-source, flume memory-channel, hdfs-sink. What if the flume service is down, does it can store the offset of topic's partitions and consume from the right position after recovery?

1

1 Answers

0
votes

Usually (with default configuration), kafka stores topic offsets for all consumers. If you start flume source with the same group id (one of consumer properties), kafka will start sending messages right from the offset of your source. But messages that has been already read from kafka and stored in your memory channel will be lost due to HDFS sink failure.