1
votes

On a Kafka Broker, it's recommended to use multiple drives for the message logs to improve throughput. That's why they have a log.dirs property that can have multiple directories that will be assigned to partitions in a round-robin fashion.

We have a lot of installations that we already setup this way for event-driven kafka applications, where we have like 4 nodes with 5 disks each.

Now we want to use Kafka-Streams with a Key-Value store where we persist computed data for fast range queries. We see that Kafka-Streams maps the partitions 1-on-1 to multiple statestores, and creates a separate subdirectory for each one.

However, we can't configure how to spread those subdirectories across different disks. We can only configure a single parent directory as 'state.dir' (StreamsConfig.STATE_DIR_CONFIG).

Is there a configuration I am missing? Or is having multiple disks not so relevant for Kafka Streams?

1
There is no config for this. Feel free to create a feature request ticket.Matthias J. Sax

1 Answers

0
votes

It's not really relevant, but this must be handled at the OS level via RAID configurations, for example.

Or you can implement the StateStore interface and write your own provider that can use multiple disks (or remote distributed filesystems)