10
votes

I'm experiencing the so-called Timeout performing HGET company:product:settings, inst: 1, queue: 8, qu=0, qs=8, qc=0, wr=0/0, in=79/1 timeout exception.

It's strange because the same Redis instance and in the same machine is storing data, but it's a specific application which throws this exception. Update: In fact, the same application, one line above receives data from Redis. The issue is with HGET.

Also, I've increased timeouts on multiplexer configuration to 6 seconds with no luck.

In addition, I've checked that IDatabase instance has IsConnected with true value.

How to interpret these error messages and what's the issue behind the whole timeout?

Some background...

I've successfully fixed the issue changing when some code section gets a database (i.e. multiplexer.GetDatabase()).

While multiplexer has an instance per AppDomain as described in StackExchange.Redis documentation, many inversion of control components are creating many instances of IDatabase in their own code. That is, IDatabase instance isn't shared.

Actual code is performing a ListRightPopLeftPush, and after that, it's instantiating an inversion of control component which reads a hash key during component instatiation. If instantiate the whole component before doing the so-called ListRightPopLeftPush, then the whole HashGet won't throw the timeout exception.

It seems like even when ListRightPopLeftPush is executed from other IDatabase instance, it produces some kind of issue in the next IDatabase instance when it comes to perform read operations.

Anyway, my fix doesn't answer the question. I've just added more detailed info to let us find what's the issue and its own solution.

Update

Anyway, the above "fix" won't fix further read accesses to Redis. I'm getting the same timeout exception in further calls. And now in paramater found in exception's message says 60/1.

1
The in=79/1 tells me that there is data available and that a reader is processing it; I will need to check to see if there is any way the reader could be stalled here.Marc Gravell♦
@MarcGravell Hey, and how you want to check that? Anyway, this is code at work, we should check that next monday...Matías Fidemraizer
@MarcGravell Nevermind, I understand you're going to check in some test ;)Matías Fidemraizer
@MarcGravell Did you get any clue about what's behind the issue? If you need further details you can ask for them and I'll add them to the questionMatías Fidemraizer
@MarcGravell I've added new details because I could "fix" the issue changing some execution order in my code. Maybe this gives you more info...Matías Fidemraizer

1 Answers

7
votes

Based on a long discussion in chat, and a lot of digging, it looks like in some obscure scenarios the TPL is hijacking the dedicated reader thread when we are doing things like .TrySetResult (which: we do often). This causes an instant deadlock if you make a synchronous call, since it can't possibly process any socket data if it is busy waiting for a task to complete (which would only ever be completed by itself). We do actually have code in place specifically to prevent this, but it looks like the workaround actually forces it to happen in some other scenarios. Which... is horrible. I will see what I can find. But basically, the problem is that currently, in some limited scenarios, TaskCompletionSource.TrySetResult is giving power to the TPL to run synchronous continuations. This includes Task.WhenAny.