0
votes

I am inspecting apache Flink code of how it creates connection clients: https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/PartitionRequestClientFactory.java#L55-L108

I am trying to think about about the waitForChannel() method that times out after 2 seconds: https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/PartitionRequestClientFactory.java#L191

I don't like this timeout and I think that if when an error occurs or a partitionRequestClient arrives a notifyall() will suffice.

Am I correct? Or do we want to constantly trying to connect after a 2 second wait?

1
Looks like you're correct from here, that timeout is pretty much useless. I'm hesitant to say for sure because there must be a really good reason to wait for exactly 2 seconds if that's the way it's written.user2982130
This sort of detailed question about Flink's implementation is better suited to one of the flink mailing lists -- probably the dev list.David Anderson

1 Answers

0
votes

We're actually not trying to (re)connect after waiting, we're just re-entering the loop to check for the condition and will wake up as soon as any of the connectLock.notifyAll() calls is executed, also before the timeout finishes.

Normally, this gives you the chance to react also on cases where there will never be such a notification but as xTrollxDudex above I actually do not see any other places here that may lead out of the loop.