2
votes

I'm consistently seeing very long delays (60+ seconds) between two actors, from the time at which the first actor sends a message for the second, and when the second actor's onReceive method is actually called with the message. What kinds of things can I look for to debug this problem?

Details

Each instance of ActorA is sending one message for ActorB with ActorRef.tell(Object, ActorRef). I collect a millisecond timestamp (with System.currentTimeMillis()) right after calling the tell method in ActorA, and getting another one at the start of ActorB's onReceive(Object). The interval between these timestamps is consistently 60 seconds or more. Specifically, when plotted over time, this interval follows a rough saw tooth pattern that ranges from more 60 second to almost 120 seconds, as shown in the graph below.

enter image description here

These actors are early in the data flow of the system, there are several other actors that follow after ActorB. This large gap only occurs between these two specific actors, the gaps between other pairs of adjacent actors is typically less than a millisecond, occassionally a few tens of milliseconds. Additionally, the actual time spent inside any given actor is never more than a second.

Generally, each actor in the system only passes a single message to another actor. One of the actors (subsequent to ActorB) sends a single message to each of a few different actors, and a small percentage (less than 0.1%) of the time, certain actors will send multiple messages to the same subsequent actor (i.e., multiple instances of the subsequent actor will be demanded). When this occurs, the number of multiple messages is typically on the order of a dozen or less.

Can this be explained (explicitely) by the normal reactive nature of Akka? Does it indicate a problem with the way work is distributed or the way the actors are configured? Is there something that can explicitly block a particular actor from spinning up? What other information should I collect or look at to understand the source of this, or to understand whether or not it is actually a problem?

1
I presume, these actors are within the same JVM, i.e. no remoting is involved?Haspemulator
Yes, that's correct, no remotes, all actors on the same jvmbrianmearns

1 Answers

1
votes

You have a limited thread pool. If your Actors block, they still take up space in the thread pool. New threads will not be created if your thread pool is saturated.

You may want to configure core-pool-size-factor, core-pool-size-min, and core-pool-size-max.

If you expect certain actions to block, you can instead wrap them in Future { blocking { ... } } and register a callback. But it's better to use asynchronous, non-blocking calls.