none
Suddenly getting a lot of DistributedCache.RoutingClient timeouts in log

    Question

  • We have a cloud service deployed on Azure (West Europe) that uses Azure Managed Cache (Basic, 128MB).

    This has been working fine for several weeks, but suddenly two days ago we've been losing performance big time (sometimes > 30 seconds per request) and I now have hundreds of errors in my WADLogsTable in Azure Table Storage. The errors are all the same, and are like this:

    'ERROR: <DistributedCache.RoutingClient> 3cb32c83-1fad-47f9-bc8d-383bb5f32c4a:SendMsgAndWait: Request TimedOut, msgId = 1320; TraceSource 'w3wp.exe' event'

    This error occurs every few minutes, and frequency increases under load.

    Also, I get this error in my own logs (logged after two attempts to connect to the cache have failed):

    'ErrorCode<ERRCA0017>:SubStatus<ES0006>:There is a temporary failure. Please retry later. (One or more specified cache servers are unavailable, which could be caused by busy network or servers. For on-premises cache clusters, also verify the following conditions. Ensure that security permission has been granted for this client account, and check that the AppFabric Caching Service is allowed through the firewall on all cache hosts. Also the MaxBufferSize on the server must be greater than or equal to the serialized object size sent from the client.). Additional Information : The client was trying to communicate with the server: net.tcp://blah.cache.windows.net:22233.'

    On our end there have been no changes to the way we call the cache or configuration.
    Debugging with caching enabled does not result in this error.

    Any ideas?



    • Edited by CornéMk Thursday, July 17, 2014 12:16 PM
    Thursday, July 17, 2014 11:52 AM

Answers

  • In the end we decided to move over entirely to the new Azure Redis cache. Everything is working smoothly again.

    I guess we were affected by that problem reported by Will.

    • Marked as answer by CornéMk Monday, July 21, 2014 10:35 AM
    Monday, July 21, 2014 10:35 AM

All replies

  • Hi,

    I suggest you to test it again and let me know if the issue reiterates. 

    In the meantime, we are trying to reproduce the issue in our lab environment in order to identify the cause or solution for the above issue

    As a workaround, please try implementing retryCache by following the below link

    http://blogs.msdn.com/b/cie/archive/2014/04/29/cache-retry-fails-what-next.aspx

    Hope this helps !!

    Regards,

    Sowmya

    Thursday, July 17, 2014 1:51 PM
    Moderator
  • Hi,

    From Azure Health Dashboard (http://azure.microsoft.com/en-us/status/#history), we could see this message:

    Managed Cache - Multi-Region - Partial Performance Degradation

    Starting 16 July, 2014 15:30 UTC a limited subset of customers may experience intermittent connectivity failures to their Cache resources. Our Engineers have determined that this issue is not widespread. Recovery status updates will be communicated to Managed Cache customers through their Azure Management Portal.

    Now we could fix the problem. Sorry for this inconvenience.

    Regards,

    Will


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Friday, July 18, 2014 7:36 AM
    Moderator
  • Hi,

    Hope this helped !!

    Regards,

    Sowmya

    Friday, July 18, 2014 7:49 AM
    Moderator
  • In the end we decided to move over entirely to the new Azure Redis cache. Everything is working smoothly again.

    I guess we were affected by that problem reported by Will.

    • Marked as answer by CornéMk Monday, July 21, 2014 10:35 AM
    Monday, July 21, 2014 10:35 AM