locked
Redis Cache Refusing Connections RRS feed

  • Question

  • From @gavinbarron via Twitter:


    Cache on Azure suddenly refusing connections and consuming all of the CPU. Can't even connect via redis-cli at the moment.


    Regards,
    @AzureSupport

    Monday, July 6, 2015 11:53 PM

Answers

  • You haven't implemented suggested solution correctly. You have to share ConnectionMultiplexer among all RedisCache objects. So there should be only one ConnectionMultiplexer object that is static and shared among all.

    You have marked methods as static but they are getting called again and again creating new ConnectionMultiplexer for each RedisCache object. I would modified your code as below.

    public class RedisCache : IRepositoryCache
        {
            private static ConfigurationOptions _configurationOptions;
            private readonly CachePrefix _prefix;
           
            public RedisCache(ConfigurationOptions configurationOptions, CachePrefix prefix)
            {
                if (configurationOptions == null) throw new ArgumentNullException("configurationOptions");
                _configurationOptions = configurationOptions;
                _prefix = prefix;
            }


            private IDatabase Cache
            {
                get
                {
                    return Connection.GetDatabase();
                }
            }

            private static Lazy<ConnectionMultiplexer> lazyConnection = new Lazy<ConnectionMultiplexer>(() =>
            {
                return ConnectionMultiplexer.Connect(_configurationOptions);
            });

            public static ConnectionMultiplexer Connection
            {
                get
                {
                    return lazyConnection.Value;
                }
            }

            public void ClearItem(string key)
            {
                key = _prefix + key;
                if (key == null) throw new ArgumentNullException("key");
                Cache.KeyDelete(key);
            }

            // Other cache access methods ommited for brevity
        }

    Wednesday, July 8, 2015 10:48 PM

All replies

  • I have a redis cache which was set at 1GB.

    This cache is located at wpcconnect.redis.cache.windows.net

    This approximately 6 hours ago (0600 NZT) users started reporting issues. via application logging I found that Redis was refusing client connections (RedisConnectionException: It was not possible to connect to the redis server(s); to create a disconnected multiplexer, disable AbortOnConnectFail.)

    When examining the server the Server Load metric was a 100 as was CPU usage. Memory Usage is sitting at 496MB.

    The number of connections was approximately 8K, which struck me as higher than it should be.

    Due to the unresponsive nature of the cache I have created a second cache and pointed my client app at the new cache.

    However since 'failing over' to the new cache the old cache still has ~8K open connections


    Tuesday, July 7, 2015 12:17 AM
  • I have checked few things about cache wpcconnect.redis.cache.windows.net. It is a 2.5GB cache. Server load is 100%. Number of connected clients is around 7300 which is very high for 2.5 GB cache.

    Please provide us following details to help us investigate it further:
    1.Cache Name
    2.Cache Size
    3.Date and time of errors (including timezone)
    4.Exception messages with full stack trace
    5.Number and type of client instances (e.g. web site, web role, worker role, VM)
    6.Public Virtual IP (VIP) Address of client deployments
    7.Version of StackExchange.Redis (and Microsoft.Web.RedisSessionStateProvider if applicable)
    8.Code snippet showing how you are configuring and using the ConnectionMultiplexer object. Are you sharing a single instance of ConnectionMultiplexer across the whole client process?
    9.In what region(s) are your cache service and clients?
    10.Did anything change in your client around the time of the error? Were you scaling the number of client instances up or down, or deploying a new version of the client? Does your client have auto-scale enabled?
    11.What was the CPU utilization on your client both before and during the incident?
    12.Did all requests experience high latency or timeouts at the time of the incident, or only some requests?
    13.How were the failures distributed across your clients? Evenly split, or all on a single client?
    14.What is the size of the value you are getting from or putting into the cache?
    15.What timeout values are set on the client for sync timeout and for connection timeout?


    You can contact us directly at: AzureCache@microsoft.com

    Tuesday, July 7, 2015 12:43 AM
  • 1.Cache Name:

    wpcconnect.redis.cache.windows.net

    2.Cache Size:

    Currently 2.5 GB, Scaled up from 1GB after seeing server load at 100

    3.Date and time of errors (including timezone):

    Last noted error in client at July 7th 2015, 09:27:56 (UTC +12) First occurance today at July 7th 2015, 06:39:15 (UTC +12)

    4.Exception messages with full stack trace

    [RedisConnectionException: It was not possible to connect to the redis server(s); to create a disconnected multiplexer, disable AbortOnConnectFail. SocketFailure on PING]StackExchange.Redis.ConnectionMultiplexer.ConnectImpl(Func`1 multiplexerFactory, TextWriter log):150

    Any other lines in the stack trace are our above this call.

    5.Number and type of client instances (e.g. web site, web role, worker role, VM):

    Client application is a SharePoint 2013 farm running in IaaS using Redis to cache data from remote service. There are 4 VMs in the farm which can connect to the Redis cache

    6.Public Virtual IP (VIP) Address of client deployments:

    wpc14p2spcldsvc.cloudapp.net [191.236.56.250]

    7.Version of StackExchange.Redis (and Microsoft.Web.RedisSessionStateProvider if applicable):

    StackExchange.Redis.StrongName 1.0.450

    8.Code snippet showing how you are configuring and using the ConnectionMultiplexer object. Are you sharing a single instance of ConnectionMultiplexer across the whole client process?

            private void Init()
            {
                if (_connectionMultiplexer == null)
                {
                    _connectionMultiplexer = ConnectionMultiplexer.Connect(_configurationOptions);
                    _cache = _connectionMultiplexer.GetDatabase();
                }
            }

    All calls which connect to Redis use the above Init method to create the connection. We take no additional steps to share the ConnectionMultiplexer instance.

            public RedisCache Build()
            {
                var configurationOptions = new ConfigurationOptions
                {
                    KeepAlive = _constants.RedisCacheKeepAliveTimeSeconds,
                    ConnectTimeout = 15000,
                    SyncTimeout = 15000,
                    Ssl = _constants.RedisSslEnabled,
                    AllowAdmin = true,
                    Password = _constants.RedisCachePassword
                };
                var endpoint = _constants.RedisCacheEndPoint;
                var port = _constants.RedisCachePort;
                configurationOptions.EndPoints.Add(endpoint, port);

                return new RedisCache(configurationOptions, _prefix);
            }

    The Build method is always used to construct our wrapper object

    9.In what region(s) are your cache service and clients?

    EastUS

    10.Did anything change in your client around the time of the error? Were you scaling the number of client instances up or down, or deploying a new version of the client? Does your client have auto-scale enabled?

    No known changes. No autoscale.

    11.What was the CPU utilization on your client both before and during the incident?

    Before unknown as diagnostics were not enabled.During and now server load is at 100, CPU was at 100% but has decreased to ~85%

    12.Did all requests experience high latency or timeouts at the time of the incident, or only some requests? 

    It looks like all requests although this is solely based on my observations as we are only logging and monitoring on issues and not successes.

    13.How were the failures distributed across your clients? Evenly split, or all on a single client?

    All client VMs were subject to failures

    14.What is the size of the value you are getting from or putting into the cache?

    Unsure of exact size at present. Based on figures noted during development and testing ~1MB

    15.What timeout values are set on the client for sync timeout and for connection timeout?

    15000.

    It should be noted that the client application has been re-configured to use a different Redis Cache server yet high usage and connection metrics are persisting.

    Tuesday, July 7, 2015 1:12 AM
  • From above code it is not clear if '_connectionMultiplexer' is shared among all objects or not. If it is non static than every object is going to create its own connection and which can cause connection flood on server.

    I would recommend that you just create a single ConnectionMultiplexer.  Here is the pattern we typically recommend customers use:

    private static Lazy<ConnectionMultiplexer> lazyConnection = new Lazy<ConnectionMultiplexer>(() => {
        return ConnectionMultiplexer.Connect("mycache.redis.cache.windows.net,abortConnect=false,ssl=true,password=...");
    });
     
    public static ConnectionMultiplexer Connection {
        get {
            return lazyConnection.Value;
        }
    }

    PS: I see in logs that most of connections are from IP 191.236.56.250.
    Tuesday, July 7, 2015 5:52 PM
  • Thanks for that suggestion.

    We've implemented a change to ensure that access to the ConnectionMultiplexer always passes through a static method as per your suggestion. We're still see a steady increase in the number of connections to the Redis sever over the course of the day.

    Our current Cache wrapper has the results of the GetDatabase() call stored in a field use within the instance ref:

    public class RedisCache : IRepositoryCache
    {
     private readonly ConfigurationOptions _configurationOptions;
     private readonly CachePrefix _prefix;
     private IDatabase _cache;
     public RedisCache(ConfigurationOptions configurationOptions, CachePrefix prefix)
     {
      if (configurationOptions == null) throw new ArgumentNullException("configurationOptions");
      _configurationOptions = configurationOptions;
      _prefix = prefix;
     }
     private IDatabase Cache
     {
      get
      {
       if (_cache == null)
       {
        Init(_configurationOptions);
       }
       return _cache;
      }
     }

     private void Init(ConfigurationOptions options)
     {
      _cache = LazyConnection(options).Value.GetDatabase();
     }
     private static Lazy<ConnectionMultiplexer> LazyConnection(ConfigurationOptions options)
     {
      return new Lazy<ConnectionMultiplexer>(() => ConnectionMultiplexer.Connect(options));
     }

     public void ClearItem(string key)
     {
      key = _prefix + key;
      if (key == null) throw new ArgumentNullException("key");
      Cache.KeyDelete(key);
     }

     // Other cache access methods ommited for brevity
    }

    Based on the observed behavior I'm considering altering our implementation to be:

    public class RedisCache : IRepositoryCache
    {
     private readonly ConfigurationOptions _configurationOptions;
     private readonly CachePrefix _prefix;
     public RedisCache(ConfigurationOptions configurationOptions, CachePrefix prefix)
     {
      if (configurationOptions == null) throw new ArgumentNullException("configurationOptions");
      _configurationOptions = configurationOptions;
      _prefix = prefix;
     }
     private static IDatabase Cache(ConfigurationOptions options)
     {
      IDatabase cache = LazyConnection(options).Value.GetDatabase();
      return cache;
     }

     private static Lazy<ConnectionMultiplexer> LazyConnection(ConfigurationOptions options)
     {
      return new Lazy<ConnectionMultiplexer>(() => ConnectionMultiplexer.Connect(options));
     }

     public void ClearItem(string key)
     {
      key = _prefix + key;
      if (key == null) throw new ArgumentNullException("key");
      Cache(_configurationOptions).KeyDelete(key);
     }

     // Other cache access methods ommited for brevity

    }

     

    Wednesday, July 8, 2015 9:25 PM
  • You haven't implemented suggested solution correctly. You have to share ConnectionMultiplexer among all RedisCache objects. So there should be only one ConnectionMultiplexer object that is static and shared among all.

    You have marked methods as static but they are getting called again and again creating new ConnectionMultiplexer for each RedisCache object. I would modified your code as below.

    public class RedisCache : IRepositoryCache
        {
            private static ConfigurationOptions _configurationOptions;
            private readonly CachePrefix _prefix;
           
            public RedisCache(ConfigurationOptions configurationOptions, CachePrefix prefix)
            {
                if (configurationOptions == null) throw new ArgumentNullException("configurationOptions");
                _configurationOptions = configurationOptions;
                _prefix = prefix;
            }


            private IDatabase Cache
            {
                get
                {
                    return Connection.GetDatabase();
                }
            }

            private static Lazy<ConnectionMultiplexer> lazyConnection = new Lazy<ConnectionMultiplexer>(() =>
            {
                return ConnectionMultiplexer.Connect(_configurationOptions);
            });

            public static ConnectionMultiplexer Connection
            {
                get
                {
                    return lazyConnection.Value;
                }
            }

            public void ClearItem(string key)
            {
                key = _prefix + key;
                if (key == null) throw new ArgumentNullException("key");
                Cache.KeyDelete(key);
            }

            // Other cache access methods ommited for brevity
        }

    Wednesday, July 8, 2015 10:48 PM
  • Sorry, but your implementation of

    private static Lazy<ConnectionMultiplexer> lazyConnection = new Lazy<ConnectionMultiplexer>(() =>
            {
                return ConnectionMultiplexer.Connect(_configurationOptions);
            });

    Is non-static as it uses the _configurationOptions member field. If I refactor your version a bit to account for this I get:

    public class RedisCache : IRepositoryCache
    {
     private readonly ConfigurationOptions _configurationOptions;
     private readonly CachePrefix _prefix;
     private readonly Lazy<ConnectionMultiplexer> _lazyConnection;
     public RedisCache(ConfigurationOptions configurationOptions, CachePrefix prefix)
     {
      if (configurationOptions == null) throw new ArgumentNullException("configurationOptions");
      _configurationOptions = configurationOptions;
      _prefix = prefix;
      _lazyConnection = new Lazy<ConnectionMultiplexer>(() => ConnectionMultiplexer.Connect(_configurationOptions));
     }
     private IDatabase Cache
     {
      get
      {
       return Connection.GetDatabase();
      }
     }
     public ConnectionMultiplexer Connection
     {
      get
      {
       return _lazyConnection.Value;
      }
     }

     public void ClearItem(string key)
     {
      key = _prefix + key;
      if (key == null) throw new ArgumentNullException("key");
      Cache.KeyDelete(key);
     }
     // Other cache access methods ommited for brevity
    }

    If you're looking to share a static single instance of the ConnectionMultiplexer across all instances of the RedisCache how do we pass in configuration for the connection without that class being aware of how to construct up the configuration?

    • Proposed as answer by GavinB.Net Thursday, July 9, 2015 11:48 PM
    • Unproposed as answer by GavinB.Net Friday, July 10, 2015 6:15 AM
    Wednesday, July 8, 2015 11:32 PM
  • >> Is non-static as it uses the _configurationOptions member field.

    => If you look carefully at code I have marked _configurationOptions  as static.

    >>>If you're looking to share a static single instance of the ConnectionMultiplexer across all instances of the RedisCache how do we pass in configuration for the connection without that class being aware of how to construct up the configuration?

    => You can have static public property instead of _configurationOptions and set it before creating any instance of RedisCache.

    Thursday, July 9, 2015 12:02 AM
  • Ah! Sorry, totally missed that you'd made the _configurationOptions field static.

    I'll get this change into prod later today and hopefully we'll see some improvement on the connection usage.

    Thursday, July 9, 2015 1:00 AM
  • Looks like this is all resolved now.

    Thank you so very much, your help has been greatly appreciated :)

    Thursday, July 9, 2015 7:40 PM
  • Most welcome can you please mark this post as answered.
    Thursday, July 9, 2015 11:05 PM