1
votes

I've been working on an application based on the java kafka-streams API, whose goal is to process a stream of data coming from one kafka topic, and produce it into another topic.

As it seems, whenever I start producing messages using the kafka-streams application, file handles just keep opening on the kafka brokers I'm using, and they are never closed, meaning eventually the kafka server ends up with too many open files, and the kafka and zookeeper daemons crash.

I'm using kafka-streams-1.0.1 API jar for Java, and running on JDK 11. The kafka cluster is of Kafka version 1.0.0.

My application's configuration includes the following kafka producer configs:

  • batch.size: set to 100,000 messages.
  • linger.ms: set to 1,000 milliseconds.
  • buffer.memory: set to the byte equivalent of 5 MegaBytes.

The stream processing itself is very simple, and is composed:

stream.map((k,v) -> handle(k,v)).filter((k,v) -> v != null).to(outgoingTopic);

I would appreciate any suggestions you guys might have.

2
The recommendation by Confluent is to configure the kernel of the host to allow 100.000+ open file handles. Normally the default is way below that.daniu
And it is configured to that level. Still, files keep openingStav Saad
I don't think Java 11 is officially supported by Kafka community, or officially tested. Not sure if that's the problem, thoughOneCricketeer
Java 11 is used as the runtime of my own application and should not affect stuff happening on the Kafka serversStav Saad
It does seem like your usage of kafka stream is pretty simple. I would suggest using a more recent version of Kafka, to see if that fixes the issue. 2.0.0 has been out for a few months already, and 2.1.0 is about to be released, probably in the next couple of weeks.mjuarez

2 Answers

0
votes

Use Java 8 or Java 10 or lower and Use latest Kafka, https://kafka.apache.org/quickstart

See some reports here about bug filed https://issues.apache.org/jira/browse/KAFKA-6855

0
votes

It seems that overriding Kafka streams timestamp extractor is not a good idea if messages might result in an out of order timestamps. After reverting to the default timestamp extractor I had it was all fixed