0
votes

Can someone help me understand what would happen in the following scenario:

A Stream from Topic A has some various operations performed on it that causes multiple internal kafka topics to be generated such as : KSTREAM-REDUCE-0000000014 KSTREAM-JOIN-0000000358 etc.

These show up in the topology as "consumer-group-name-generated-name"

Topic A joins Topic B ... B has to be rekey'd to join with A into an internal topic "group-Re-KeyB".

If my topology changes, then unless all these internal topics are named the same, I need to change my consumer group name or randomly generated KSTREAM-REDUCE-0000000014 might contain a different kind of object.

If I set the offset for the new consumer group to be latest committed from the previous consumer group, we won't be replaying Topic A or B from the beginning.

What happens to those internal topics? Would "group-Re-KeyB" for example have all the data to do a join to A or would it only know about new Topic B records ??

1

1 Answers

0
votes

If you change your topology and name changes, the old and new topology are most likely incompatible and it's recommended to reset your application and let the new topology reprocess all data from the beginning (to rebuild it's needed internal state): https://docs.confluent.io/current/streams/developer-guide/app-reset-tool.html

As an alternative, you can specify explicit names for all operator (as of Kafka Stream 2.4), for example you can use Materialzed.as(...) to name a state store and the corresponding changelog topic. Explicit naming avoids that names of internal topics change and thus, even if you change the topology, you might be able to restart the new topology without using a new application.id and thus preserve the state from the old topology.