2
votes

I have a single Kafka topic with multiple user info events for multiple different users. I'm trying to figure out how to to aggregate these together using a number of fields from the value.

For example:

Input Topic:

1:{"SSN":"123456"}
2:{"twitterHandle":"elvis"}
3:{"SSN":"123456","twitterHandle":"elvis","accountNum": "111111"}
4:{"SSN":"123456"}
5:{"SSN":"000000"}
6:{"twitterHandle":"foo"}
7:{"SSN":"000000","twitterHandle":"foo"}
8:{"SSN":"000000"}

I want an Output Topic (aggregated):

{"SSN":"123456","twitterHandle":"elvis","accountNum": "111111"}
{"SSN":"000000","twitterHandle":"foo"}

How can I achieve this with Kafka Streams? Can I create a KStream from Input Topic and convert it to a KTable to get output Topic?

Update: The topic contains events from multiple different users. The user identifiers (SSN, twitterHandle) are not fixed. There maybe other ids for a users

1
It's not pretty clear how many events within a single Input Topic you would like to aggregate. Can the Input Topic contain more than two users events?dmkvl
Yes the topic contains multiple events for different users and I need to aggregate events for same user together based on a composite key (identifiers SSN,Twitter handle etc). An Event may only contain one identifier like SSN for that userDarVar
How will you find out which twitterHandle needs to be mapped with which SSN?mythic
@nikitap initially there is no way. But when event 3 comes in it links these keys, like a composite key. So for event 1 & 2 I want to assume they are different Users and only after event 3 comes in I aggregate them into a single userDarVar

1 Answers

0
votes

If you blindly want to remove message 1 & 2 and keep message 3, you can use a consumer interceptor.

The interceptor would blindly parse the json message, check if the message has both the keys present(and not null) and then successfully send the message ahead, else dont. You dont need a kstream app in that case. Just one interceptor class which needs to be used while consuming the message.

However, if you want to just stitch 1 and 2 without any common key present in between them, I dont think thats possible because we dont know which SSN needs to be merged with which twitter handle.

Let me know if I could help in any other way.