2
votes

I know joins can be performed using the dsl api. We need to use the processor api for various reasons.

How would one implement joining to streams using the processor api. Some ideas I had but dont think they are right.

  1. One processor with multiple source topics. Base Object for the process interface and then cast to correct type inside process method.

  2. Two Processors each with their own source topic. Each processor gets read only access to the other processors state store (if that is possible).

Any ideas - I did find the join implementation in KStreamImpl but am having trouble following. Perhaps an explantation on how the dsl does it?

1

1 Answers

5
votes

Both implementations you suggest are possible. Kafka Stream itself uses 5 processor to implement stream-stream join:

source1 ---> "state maintainer 1" --> "joiner 1" ----+
                      |                   |          |
                   updates          "join lookups"   |
                      |                   |          +-----+
                      |            +------+                |
                      v            |                       v
                  "state 1" <------|------+             "merger" -->
                                   |      |                ^
                  "state 2" <------+      |                |
                      ^                   |          +-----+
                      |                   |          |
                   updates          "join lookups"   |
                      |                   |          |
source2 ---> "state maintainer 2" --> "joiner 2" ----+

Left and right pipeline are symmetric. Both have a "state maintainer" and "joiner" Processor. "State maintainer" has write access to the state. "Joiner" as read access to the other state. As a last step, both join result streams are merged together.