1
votes

Let's say we have an instance with config as follows and states maintained in a store.

topic - 1
partitions - 6
num.stream.threads - 6

Toplogy
source - 1
low level processors - 3 (one processor each for Daily, monthly & yearly aggregation)
sink - 3

  • How many parallel tasks are possible with the above topology and topic partitions?
  • In the event there are 2 parallel tasks assigned for Daily processor, and if the punctuate is scheduled to be run every 30 mins, and in the override if we forward all of the store to the sink 1, would the keyvalue store be posted twice to the sink, since the 2 parallel tasks shares the same store OR will each task have its own store and will only publish the data corresponding to the partitions they are assigned which are being persisted in their respective store?

    KeyValueIterator<String, House> keyValueIterator = houseStore.all();
    while (keyValueIterator.hasNext()) {
        KeyValue<String, House> next = keyValueIterator.next();
        context.forward(next.key, next.value);
    }
    keyValueIterator.close();
    
  • How many tasks will be there if we instead use the KTable (one for each daily, monthly and yearly aggregation) the high level DSL? Can there be two parallel tasks updating the same KTable (say daily)?

1

1 Answers

1
votes

Kafka Streams will create 6 tasks, because the source topic has 6 partitions. The state, will be partitioned into 6 shards, one shard for each tasks. Thus, the local store in a task is task exclusive and only contains data of the corresponding shard. If you scan the whole store per task, you will not get duplicate data in your output topics because it's different data in each shard.