Skip to content

Sudden "RedisConnectionException: No connection is active/available to service this operation" on v2.1.30 #1510

@jbayardo

Description

@jbayardo

We have a lot of examples where some of our machines (which seem to be running just fine) suddenly start having extremely high amounts of RedisConnectionException. This is happens in multiple machines at the same time.

Here are a couple of sample traces from our logs:

StackExchange.Redis.RedisConnectionException: No connection is active/available to service this operation: SET MW_S13PDV4MD5:DB9F6CB7873922C015E0, mc: 1/1/0, mgr: 10 of 10 available, clientName: MW1PAPS0A0FF1A4, IOCP: (Busy=0,Free=1000,Min=801,Max=1000), WORKER: (Busy=14,Free=32753,Min=801,Max=32767), v: 2.1.30.38891
   at StackExchange.Redis.RedisBatch.Execute() in /_/src/StackExchange.Redis/RedisBatch.cs:line 36
   at BuildXL.Cache.ContentStore.Distributed.Redis.RedisBatch.ExecuteBatchOperationAndGetCompletion(Context context, IDatabase database)
   at BuildXL.Cache.ContentStore.Distributed.Redis.RedisDatabaseAdapter.<>c__DisplayClass22_0.<<ExecuteBatchOperationAsync>b__2>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at BuildXL.Cache.ContentStore.Distributed.Redis.RetryPolicyExtensions.<ExecuteAsync>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at BuildXL.Cache.ContentStore.Distributed.Redis.RedisDatabaseAdapter.<>c__DisplayClass22_0.<<ExecuteBatchOperationAsync>b__0>d.MoveNext()
   --- End of inner exception stack trace ---
---> (Inner Exception #0) StackExchange.Redis.RedisConnectionException: No connection is active/available to service this operation: SET MW_S13PDV4MD5:DB9F6CB7873922C015E0, mc: 1/1/0, mgr: 10 of 10 available, clientName: MW1PAPS0A0FF1A4, IOCP: (Busy=0,Free=1000,Min=801,Max=1000), WORKER: (Busy=14,Free=32753,Min=801,Max=32767), v: 2.1.30.38891
StackExchange.Redis.RedisConnectionException: No connection is active/available to service this operation: SET MW_S13PDV4VSO0:68D9FBB47BB33709B4A8, mc: 1/1/0, mgr: 10 of 10 available, clientName: MW1PAPS0A0FF26E, IOCP: (Busy=0,Free=1000,Min=801,Max=1000), WORKER: (Busy=2,Free=32765,Min=801,Max=32767), v: 2.1.30.38891
   at StackExchange.Redis.RedisBatch.Execute() in /_/src/StackExchange.Redis/RedisBatch.cs:line 36
   at BuildXL.Cache.ContentStore.Distributed.Redis.RedisBatch.ExecuteBatchOperationAndGetCompletion(Context context, IDatabase database)
   at BuildXL.Cache.ContentStore.Distributed.Redis.RedisDatabaseAdapter.<>c__DisplayClass22_0.<<ExecuteBatchOperationAsync>b__2>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at BuildXL.Cache.ContentStore.Distributed.Redis.RetryPolicyExtensions.<ExecuteAsync>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at BuildXL.Cache.ContentStore.Distributed.Redis.RedisDatabaseAdapter.<>c__DisplayClass22_0.<<ExecuteBatchOperationAsync>b__0>d.MoveNext()
   --- End of inner exception stack trace ---
---> (Inner Exception #0) StackExchange.Redis.RedisConnectionException: No connection is active/available to service this operation: SET MW_S13PDV4VSO0:68D9FBB47BB33709B4A8, mc: 1/1/0, mgr: 10 of 10 available, clientName: MW1PAPS0A0FF26E, IOCP: (Busy=0,Free=1000,Min=801,Max=1000), WORKER: (Busy=2,Free=32765,Min=801,Max=32767), v: 2.1.30.38891

Here's what we have looked into:

  • We have tried re-creating the ConnectionMultiplexer after a certain amount of exceptions. This doesn't work, the errors come back again.
  • Restarting the process seems to work at least temporarily, but is not really a long-term option for us.
  • We have looked into network connectivity and have performed packet captures. We don't seem to have any issues in this area (this is also inside a datacenter, so network is relatively stable).
  • The Redis instances are Azure Cache for Redis. Connection strings do include ssl, and we have the Redis.StackExchange's internal logs, which show we are connecting with Tls1.2, as expected. No errors are shown at the time these exceptions are thrown.
  • There are no ongoing maintenance operations whatsoever from Azure Cache for Redis at the times this happens (i.e. no upgrades, no cluster size changes, or anything like that).
  • Azure's metrics show that the number of connected clients remains stable, even though the number of affected machines is high enough that we'd expect to see it if there were issues.

All of these facts that came out of our investigation leads us to believe this may be a bug in the library, even though our initial guess was that it'd be the network.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions