2
votes

I have a topic keyed by byte[], I want to repartition it and process the topic by another key in a field in the message body.

I find there is KGroupedStream and groupby function. But it asks for an aggregation function to convert to a KTable/KStream. I don't need an aggregate. I just want to repartition and process the output.

2

2 Answers

8
votes

Yes you can. You set a new key and afterwards pipe the data through another topic.

// repartition() will create the required topic automatically for your,
// with the same number of partitions as your input topic;
//
// it's also possible to set the number of partitions explicitly to scale in/out
// via `repartitioned(Repartitioned.numberOfPartitions(...))`
KStream stream = ...
KStream repartionedStream = stream.selectKey(...)
                                  .repartition();

// older versions:
//
// using `through()` you need to create the use topic manually,
// before you start your application
KStream stream = ...
KStream repartionedStream = stream.selectKey(...)
                                  .through("topic-name");

Note, that you need to create the topic you use in through() before you start the application with the desired number of partitions.

-1
votes

Not sure if this is entirely kosher, but it works and the repartition topic is created automatically and with the right number of partitions wrt the stream.

KTable emptyTable = someTable.filter((k, v) -> false);
KStream stream = ...
KStream repartionedStream = stream.selectKey(...)
                            .leftJoin(emptyTable, (v, Null) -> v, ...);

Edit

Please use with caution! This approach apparently became a complicated abomination deserving of an avalanche of downvotes and a flogging in Aug 2020 when Kafka Streams 2.6.0 was introduced and KStream.repartition() came into existence.

So for streams version 2.6+ you must use

KStream stream = ...
KStream repartionedStream = stream.selectKey(...)
                                  .repartition();