3
votes

We have just upgraded from Play framework 2.4.3 to 2.5.0 (java). However, after upgrading, our tests start getting timeout after a couple of minutes. Before the upgrade they ran for an hour without errors.

It looks like some threads are getting blocked, and the system simply stops responding.

I am running a smaller version of the load test locally on my machine, with Yourkit java profiler. Initially, there are 16 netty-event-loop threads started. After about a minute, I can see that they have started blocking:

blocked netty-event-loop threads

When they block I start getting timeouts in the load test. When I turn off the test, these threads seem to recover:

recovering netty-event-loop threads

I am hoping someone here can help us determine what is causing this. We have not modified our code at all apart from the changes needed to upgrade to Play 2.5.

Here is the akka thread pool config we're using in application.conf:

akka {
  fork-join-executor {
    # The parallelism factor is used to determine thread pool size using the
    # following formula: ceil(available processors * factor). Resulting size
    # is then bounded by the parallelism-min and parallelism-max values.
    parallelism-factor = 3.0

    # Min number of threads to cap factor-based parallelism number to
    parallelism-min = 8

    # Max number of threads to cap factor-based parallelism number to
    parallelism-max = 64

    # Setting to "FIFO" to use queue like peeking mode which "poll" or "LIFO" to use stack
    # like peeking mode which "pop".
    task-peeking-mode = "FIFO"
  }
}

The profiler shows the following info about the blocked threads:

Information from the thread monitor

Can anyone provide some insight into what we might be doing wrong? Thanks for your help.

1
Are you using sbt run or are you running your app in production mode?marcospereira
For the local test (from which these screenshots are taken), I am running the application in DEV mode using the Lightbend activator / sbt run. But in our test environment we're running in production mode.OGG
Same here, after upgrading to 2.5.0. We are not using Deadbold. Still searching for the bug.gun
I would be glad if anybody has an hint. I am not even sure if it is a netty issue. This is how it looks like with my application: The request get answered very fast and suddenly the backend stops answering. Then after a while (1 minute) it recovers and works again. It looks like there is no Thread left to dispatch the requests...gun
Sorry to pollute this Thread, but I found out, that it only happens on PUT Methods. All my GET and POST Methods don't make Problems. I even copied and switched the funktionality between my GET and PUT endpoints and: It is reproducable: Only PUTs are affected. Netty Problem? Any body else with the same issue?gun

1 Answers

0
votes

This issue seems to be resolved for us. We were using Deadbolt-java 2.5.0-SNAPSHOT for authorization in templates and controllers. We were seeing some timeout messages in our logs related to Deadbolt.

So we completely removed Deadbolt from our project, and now the load tests run faster than ever.