I am trying to make a local aggregation.
The input topic has records containing multiple elements and I am using flatmap to split the record into multiple records with another key (here element_id). This triggers a re-partition as I am applying a grouping for aggregation later in the stream process.
Problem: there are way too many records in this repartition topic and the app cannot handle them (lag is increasing).
Here is a example of the incoming data
key: another ID
value:
{
"cat_1": {
"element_1" : 0,
"element_2" : 1,
"element_3" : 0
},
"cat_2": {
"element_1" : 0,
"element_2" : 1,
"element_3" : 1
}
}
And an example of the wanted aggregation result:
key : element_2
value:
{
"cat_1": 1,
"cat_2": 1
}
So I would like to make a first "local aggregation" and stop splitting incoming records, meaning that I want to aggregate all elements locally (no re-partition) for example in a 30 seconds window, then produce result per element in a topic. A stream consuming this topic later aggregates at a higher level.
I am using Stream DSL, but I am not sure it is enough. I tried to use the process() and transform() methods that allow me to benefit from the Processor API, but I don't known how to properly produce some records in a punctuation, or put records in a stream.
How could I achieve that ? Thank you