locked
Poor performance with Azure Cache RRS feed

  • Question

  •  

    After switching a couple of database calls to cache, we actually had worse performance. We noticed a huge jump in CLR time and response time according to new relic. Please see attached graph for the jump (cache was introduced 1/5 at 0:00). The *only* thing that has changed has been the introduction of Azure App Fabric Cache.  Our cache client uses a singleton pattern so there is only one for the instance of the webservice.  the cache factory is created once and then stored away so we are not en-curing the overhead of opening the connection each time.

    Furthermore, NewRelic reports that cache is taking on average 15ms. In many cases, 15ms can be slower than the database!!!!

    nto The object we are sticking i cache constits of two byte arrays, one has a length of about 421 and the other has a length of 8.

    Not really understanding why with the introduction of cache we see increased response time.  Is a byte array not cache friendly?  

    my class looks like this (the only two properties that get populated prior to being shoved into class is the two byte arrays everything else is left to default values)

     

    [Table]
        public class GameState
        {
            [Column(IsPrimaryKey = true, IsDbGenerated = true, AutoSync = AutoSync.OnInsert)]
            public int Id { get; set; }
    
            [Column(UpdateCheck = UpdateCheck.Never, Name = "game_id")]
            public int GameId { get; set; }
    
            [Column(UpdateCheck = UpdateCheck.Never, Name = "player_id")]
            public int PlayerId { get; set; }
    
            [Column(UpdateCheck = UpdateCheck.Never, DbType = "VarBinary(max)")]  //has a length around 421
            public byte[] State { get; set; }
    
            [Column(UpdateCheck = UpdateCheck.Never, IsDbGenerated = true, AutoSync = AutoSync.OnInsert)]
            public DateTime Created { get; set; }
    
            [Column(UpdateCheck = UpdateCheck.Never, Name = "update", IsDbGenerated = true, DbType = "timestamp")] //has a length of 8
            public byte[] TimeStamp { get; set; }
        }
    

    Thanks

     


    -justin
    • Edited by odyth Friday, January 6, 2012 8:00 AM added more info
    Friday, January 6, 2012 7:57 AM

Answers

  • I gave up and ended up moving to Amazon ec2, and have zero slow down on the cache so i know it wasn't my code, but Azure.  Not sure what was going on.  It would be really really helpful if from within side the Azure Management Console you had access to performance monitor tools surrounding your cache.
    -justin
    • Marked as answer by odyth Tuesday, January 31, 2012 2:53 AM
    Tuesday, January 31, 2012 2:53 AM

All replies

  • Hi,

    I am trying to involve someone familar Azure Cache Performance, it's will have a short delay, sorry for any inconvenient.

    Thank you


    Please mark the replies as answers if they help or unmark if not. If you have any feedback about my replies, please contact msdnmg@microsoft.com Microsoft One Code Framework
    Saturday, January 7, 2012 3:09 AM
  • Hi,

    Is your cache client a cloud service (web role, etc.)? The Azure Caching service is mainly used for services hosed on Azure cloud. If you use it for on-premises applications the average performance may not be increased compared with accessing local database directly (assume you don't enable local cache).

    If it's used for a cloud service, is the Country/Region of the Caching service the same as the cloud service? Do you enable local cache?


    Allen Chen [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.



    Monday, January 9, 2012 6:17 AM
  • Everything is in azure.  Everything is located in North Central US.  I have 11 large instances with 6 connections each to a 4gb cache instances.

     

    I dont have local cache set as I want the 11 services to share the same cahce.  SHould i have the local properties enabled?  What does that do?

     

    Thanks,


    -justin
    Monday, January 9, 2012 6:28 AM
  • I dont have local cache set as I want the 11 services to share the same cahce.  SHould i have the local properties enabled?  What does that do?

     

    Thanks,


    -justin

    Local cache stores data locally on each role instance thus greatly improves the performance for data read. If your data is read-only it's recommended.

    http://www.windowsazure.com/en-us/home/tour/caching/

    If your data is frequently updated and your application requires data be up to date for all instances you'd better don't use it and fully rely on cache tier.

    As to the original problem it looks strange as the object-retrieval-time should be around 6 ms:

    http://windowsazurecat.com/2011/07/reaching-stable-performance-in-appfabric-cache-with-a-non-idle-cache-channel/

    Could you test a single cache query and post test code here?


    Allen Chen [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.


    Monday, January 9, 2012 8:04 AM
  • Hi allen,

     

    The objects in cache are very volatile so storing in memory is useless for us.  Also we have a really high load usually 20k requests a minute, and every call pulls something from cache.

    Below is the cache client I use to access the cache.

    public class DiceCacheClient
    {    
    
    	private const int MAX_RETRIES = 5;
            private static volatile DiceCacheClient instance;
            private static object syncRoot = new object();
    
            private DataCache dataCache;
            private DataCacheFactory cacheFactory;
    
            public DiceCacheClient()
            {
                cacheFactory = new DataCacheFactory();
                
                dataCache = cacheFactory.GetDefaultCache();
            }
    
            private static DiceCacheClient Instance
            {
                get
                {
                    if (instance == null)
                    {
                        lock (syncRoot)
                        {
                            if (instance == null)
                            {
                                instance = new DiceCacheClient();
                            }
                        }
                    }
                    return instance;
                }
            }        
    
    
            public static T Get<T>(string key, int daysToCache, Func<T> creator)
            {
                T item = default(T);
                try
                {
                    item = (T)Instance.dataCache.Get(key);
                    if (item != null)
                    {
                        return item;
                    }
                }
                catch (DataCacheException ex)
                {
                    Utility.LogException(ex, string.Format("errorCode={0}", ex.ErrorCode), "DiceCacheClient Get", ServiceType.GameService);
                }
                catch (Exception ex)
                {
                    Utility.LogException(ex, "DiceCacheClient Get", ServiceType.GameService);
                }
    
                item = creator();
                if (item != null)
                {
                    Put<T>(key, daysToCache, item);
                }
                return item;
            }        
    
            public static bool Put<T>(string key, int daysToCache, T value)
            {
                return Put<T>(key, daysToCache, value, MAX_RETRIES);
            }
    
            public static bool Put<T>(string key, int daysToCache, T value, int retry)
            {
                try
                {
                    Instance.dataCache.Put(key, value, TimeSpan.FromDays(daysToCache));
                    return true;
                }
                catch (DataCacheException ex)
                {
                    if (retry > 0 && ex.ErrorCode == DataCacheErrorCode.RetryLater)
                    {
                        Thread.Sleep(3);
                        return Put<T>(key, daysToCache, value, retry - 1);
                    }
                    Utility.LogException(ex, string.Format("key={0}, errorCode={1}", key, ex.ErrorCode), "DiceCacheClient Put", ServiceType.GameService);
                }
                catch (Exception ex)
                {
                    Utility.LogException(ex, string.Format("key={0}", key), "DiceCacheClient Put", ServiceType.GameService);
                }
                return false;
            }
    }
    

    web.config looks like this: hostname is the hostname from azure and the key is the correct key

    <dataCacheClient maxConnectionsToServer="6">
        <hosts>
          <host name="hostname" cachePort="22233" />
        </hosts>
    
        <securityProperties mode="Message">
          <messageSecurity
            authorizationInfo="key">
          </messageSecurity>
        </securityProperties>
      </dataCacheClient>
    

    now our service header looks like this:

     

    [ServiceContract]
        [AspNetCompatibilityRequirements(RequirementsMode = AspNetCompatibilityRequirementsMode.Allowed)]
        [ServiceBehavior(InstanceContextMode = InstanceContextMode.Single, ConcurrencyMode = ConcurrencyMode.Multiple)]
    

    so there should be one instance of the service, which means one instance of the singleton inside the DiceCacheClient class.  I have tested this by simulating load.  Now when I simulate load with just a few clients its pretty fast, as soon as i put it under real load it all goes bad.


    -justin
    Monday, January 9, 2012 8:17 AM
  • Hi,

    >I have tested this by simulating load.  Now when I simulate load with just a few clients its pretty fast, as soon as i put it under real load it all goes bad.

    This seems implies the caching actually works well. Could you check whether you get any quota exceed exceptions? What's your cache size? You may consider using bigger cache size as your app seems require lots of bandwidth, transactions. 20k requests per min requirement in your scenario means at least 1200k transactions per hour (if you have X cache requests for each client request, you'll have X*1200k transactions) so you should choose at least 512MB. If your cache data is large you should also consider bandwidth.

    Cache Size

    Transactions Per Hour

    Bandwidth MB Per Hour

    Concurrent Connections

    128MB 400000 1400 10

    256MB

    800000

    2800

    10

    512MB

    1600000

    5600

    20

    1GB

    3200000

    11200

    40

    2GB

    6400000

    22400

    80

    4GB

    12800000

    44800

    160

     http://msdn.microsoft.com/en-us/library/hh697522.aspx


    Allen Chen [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

     

     








    • Edited by Allen Chen - MSFT Wednesday, January 11, 2012 1:44 AM
    • Marked as answer by Arwind - MSFT Thursday, January 12, 2012 2:37 AM
    • Unmarked as answer by odyth Thursday, January 12, 2012 2:42 AM
    Tuesday, January 10, 2012 10:00 AM
  • As I said above I have the 4gb cache limit and am not exceeding any quotes.  every exception our application could generate is logged and there aren't any for azure cache.

     

    ">I have tested this by simulating load.  Now when I simulate load with just a few clients its pretty fast, as soon as i put it under real load it all goes bad.

    This seems implies the caching actually works well. "

     

    How does putting load on the cache and having its performance times increase (cache slows down) indicate that cache is actually working well?


    -justin
    Thursday, January 12, 2012 2:45 AM
  • I even changed my configuration from 11 large servers with 6 connections to 5XL servers with 20 connections to the cache and the performance is the same.


    -justin
    Thursday, January 12, 2012 2:47 AM
  • Hi,

    >How does putting load on the cache and having its performance times increase (cache slows down) indicate that cache is actually working well?

    I said this because this is not likely to be the problem of the caching service itself as it has passed lots of tests from Microsoft. If it works for light load it should work for heavy load (within quota). So in my opinion, this behavior indicates the cause is something related to app code.

    I just did a test. Below is the full test code. I have 1 instace of WebRole and 8 instances of WorkerRoles to send a HTTP requests to the WebRole every 50ms. For each request the WebRole query caching service and performance data for Get cache request is logged.

    WebRole code:

     public partial class _Default : System.Web.UI.Page
        {
          static  DataCacheFactory cacheFactory = new DataCacheFactory();
          static object _mylock = new object();
          static List<long> _performanceData = new List<long>();
            protected void Page_Load(object sender, EventArgs e)
            {
             
                DataCache cache = cacheFactory.GetDefaultCache();
                var requestQueryString = Request.QueryString;
                if (requestQueryString["Method"] == "PutCache")
                {
                                  cache.Put("key", new GameState() { State = new byte[421] });
                }
                else if (requestQueryString["Method"] == "GetCache")
                {
                    Stopwatch sw = new Stopwatch();
                    sw.Start();
                    var result = cache.Get("key");
                    sw.Stop();
                    // Log first 100000 requests
                    lock (_mylock)
                    {
                        if (_performanceData.Count < 100000)
                        {
                            _performanceData.Add(sw.ElapsedMilliseconds);
                        }
                      
                    }
                  
                }
                this.Label1.Text = _performanceData.Count + " Requests Logged. AVG Request Time: " + _performanceData.Average()+ " MS";
            }
         
        }
        public class GameState
        {
         
            public int Id { get; set; }

         
            public int GameId { get; set; }

        
            public int PlayerId { get; set; }
          //has a length around 421
            public byte[] State { get; set; }

         
            public DateTime Created { get; set; }

          //has a length of 8
            public byte[] TimeStamp { get; set; }
        }

     

    aspx:

      <asp:Label ID="Label1" runat="server" Text="Label"></asp:Label> 

    Worker Role code:

       public class WorkerRole : RoleEntryPoint
        {
            static Random r = new Random();
            public override void Run()
            {
                // This is a sample worker implementation. Replace with your logic.
                Trace.WriteLine("$projectname$ entry point called", "Information");

                while (true)
                {
                    Thread.Sleep(50);
                    try
                    {
                        WebClient wc = new WebClient();
                        int result = r.Next(0, 100);
                        if (result % 2 == 0)
                        {
                            wc.DownloadString(@"http://mydnsname.cloudapp.net/Default.aspx?Method=GetCache");
                        }
                        else
                        {
                            wc.DownloadString(@"http://mydnsname.cloudapp.net/Default.aspx?Method=PutCache");
                        }
                    }
                    catch (Exception ex)
                    {
                    }
                 
                    Trace.WriteLine("Working", "Information");
                }
            }

            public override bool OnStart()
            {
                // Set the maximum number of concurrent connections
                ServicePointManager.DefaultConnectionLimit = 100;

                // For information on handling configuration changes
                // see the MSDN topic at http://go.microsoft.com/fwlink/?LinkId=166357.

                return base.OnStart();
            }
        }

    I kept them running for a while and then opened the Default.aspx to see the results, as below. Though the load is not as much high as yours it's close (around 10k per min).  I'm not sure what might be the problem so far. Could you help to compare my test code with yours to see whether there are any differences?

     


    Allen Chen [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.











    Thursday, January 12, 2012 7:45 AM
  • I will work through your example later today.  

     

    Found this error in the database today, if you read through the exception you see this

    "The socket was aborted because an asynchronous receive from the socket did not complete within the allotted 

      timeout of 00:00:40."

    so it was trying to put stuff in the cache for up to 40 seconds!

     

    Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<ERRCA0017>:SubStatus<ES0006>:There is a temporary failure. Please retry later. 

    (One or more specified cache servers are unavailable, which could be caused by busy network or servers.

     For on-premises cache clusters, also verify the following conditions. Ensure that security permission has been granted for this client account, 

     and check that the AppFabric Caching Service is allowed through the firewall on all cache hosts.

      Also the MaxBufferSize on the server must be greater than or equal to the serialized object size sent from the client.) ---> 

      System.ServiceModel.CommunicationException: The socket was aborted because an asynchronous receive from the socket did not complete within the allotted 

      timeout of 00:00:40. The time allotted to this operation may have been a portion of a longer timeout. ---> 

      System.ObjectDisposedException: The socket connection has been disposed.  Object name: 'System.ServiceModel.Channels.SocketConnection'.     ---

       End of inner exception stack trace ---     at System.ServiceModel.Channels.SocketConnection.ThrowIfNotOpen()    

        at System.ServiceModel.Channels.SocketConnection.BeginRead(Int32 offset, Int32 size, TimeSpan timeout, WaitCallback callback, Object state)    

         at System.ServiceModel.Channels.SessionConnectionReader.BeginReceive(TimeSpan timeout, WaitCallback callback, Object state)    

          at System.ServiceModel.Channels.SynchronizedMessageSource.ReceiveAsyncResult.PerformOperation(TimeSpan timeout)    

           at System.ServiceModel.Channels.SynchronizedMessageSource.SynchronizedAsyncResult`1..ctor(SynchronizedMessageSource syncSource, TimeSpan timeout, 

           AsyncCallback callback, Object state)     at 

           System.ServiceModel.Channels.FramingDuplexSessionChannel.BeginReceive(TimeSpan timeout, AsyncCallback callback, Object state)   

             at Microsoft.ApplicationServer.Caching.WcfClientChannel.CompleteProcessing(IAsyncResult result)    

              --- End of inner exception stack trace ---     at Microsoft.ApplicationServer.Caching.DataCache.ThrowException(ResponseBody respBody)     

              at Microsoft.ApplicationServer.Caching.DataCache.InternalGet(String key, DataCacheItemVersion& version, String region, IMonitoringListener listener)

                   at Microsoft.ApplicationServer.Caching.DataCache.<>c__DisplayClass49.<Get>b__48()     

                   at Yacht.Domain.DiceCacheClient.Get[T](String key, Int32 daysToCache, Func`1 creator) in

                    C:\Users\DevBox\Documents\Visual Studio 2010\Projects\with-buddies-server\Yacht.Domain\DiceCacheClient.cs:line 84


    -justin
    Thursday, January 12, 2012 7:38 PM
  • Hi,

    Have you tested my code? Does it have the same result? As to the DataCacheException you mentioned how frequently does it happen?


    Allen Chen [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.




    Monday, January 16, 2012 9:39 AM
  • Our cache load doesn't approach what Justin is describing. We are seeing this exception frequently. 
    Thursday, January 19, 2012 9:33 PM
  • Our cache load doesn't approach what Justin is describing. We are seeing this exception frequently. 


    Hi,

    Do you see the same SubStatus ES0006? If so please try to catch the exception and add some retry logic. If that doesn't help you can contact our customer support to do further investigation:

    http://www.windowsazure.com/en-us/support/contact/


    Allen Chen [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    • Marked as answer by Allen Chen - MSFT Tuesday, January 31, 2012 2:40 AM
    • Unmarked as answer by odyth Tuesday, January 31, 2012 2:52 AM
    Friday, January 20, 2012 1:31 AM
  • I gave up and ended up moving to Amazon ec2, and have zero slow down on the cache so i know it wasn't my code, but Azure.  Not sure what was going on.  It would be really really helpful if from within side the Azure Management Console you had access to performance monitor tools surrounding your cache.
    -justin
    • Marked as answer by odyth Tuesday, January 31, 2012 2:53 AM
    Tuesday, January 31, 2012 2:53 AM