I am having a kafka topic with 2 million messages and my flush size is 100000 with default partitions in distributed mode with 4 workers and I am able to see the data is written to HDFS immediately in few seconds in (10 to 15 seconds).
I see that there is a +tmp directory created and folder and the topic is created every time a new connector is triggered.
Is it the behavior of kafka connect to write it this fast every time or is it storing the data in HDFS already and moves it to the topic directory based on the connector properties?
If I want to calculate the latency for this,how can I calculate it?
And if I stop and delete the topic directory in both /topics and /temp and retrigger the same topic will it again pull data from Kafka or will it get the data from some place in hdfs as a backup?
Need clarity on how this is happening.Please let me know if my understanding is not right.