0
votes

We are investigating an issue on an api on Azure connecting to Azure Redis Cache (tier C2 standard), since yesterday evening to today early morning (nearly 12 hours) we've seen a hundreds of timeouts to redis like this

Timeout performing GET ????????:FV:Providers:Weather, inst: 1, mgr: Inactive, err: never, queue: 318, qu: 2, qs: 316, qc: 0, wr: 1, wq: 1, in: 65536, ar: 0, clientName: Items, serverEndpoint: ?????????:6380, keyHashSlot: 1586, IOCP: (Busy=1,Free=999,Min=8,Max=1000), WORKER: (Busy=66,Free=32701,Min=300,Max=32767

During the night we don´t receive much visits but the error still until today around 9 o´clock, the items in the redis queue were up to 7000 but the traffic to our api was very low during the night.

During the day all was ok except this afternoon during an hour when we got a peak of visitors the problem appeared again. We´ve been looking a lot of metrics, cache read/writes operations is as usual, cache hits, cpu, memory, ... all it´s ok.

Even other API´s use the same redis cache instance and they don´t suffer this issue. For this reason we think that the size of Azure Redis it´s correct if not other API´s will suffer the same problem.

Looking at logs we discovered that just two minutes before the timeout error started we got more than 200 exceptions like this

StackExchange.Redis.RedisConnectionException: UnableToResolvePhysicalConnection on GET at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor1 processor, ServerEndPoint server) at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor1 processor, ServerEndPoint server) at StackExchange.Redis.RedisDatabase.StringGet(RedisKey key, CommandFlags flags)

We guess that the two errors are related. But we don´t know if we are doing something wrong or it was an azure problem. May be StackExchange.Redis connection was corrupted after UnableToResolvePhysicalConnection exception and we have to restart API to solve the problem?

Other ideas?

Thanks for your help!

1

1 Answers

2
votes

StackExchange.Redis has a known problem where it will fail to reconnect under some cases even though the server is running fine. Example: https://github.com/StackExchange/StackExchange.Redis/issues/559

I suspect you are running into that type of issue. You can verify that by trying to connect to Redis from some other machine. If it connects fine, then you are likely hitting this issue. Recreating your ConnectionMultiplexer should fix the problem. Rebooting your client should fix it if you don't have a way to recreate the multiplexer.

I have a bunch of best practices that may help you structure your code to handle such cases, including general best practices as well as StackExchange.Redis specific recommendations. https://aka.ms/redis/bestpractices