none
SharePoint 2013 Workflows keep crashing due to Distributed Logon Token Cache timeouts. Cache keeps growing, doesn't delete expired data.

    Question


  • We are using SharePoint 2013 on Windows Server 2012

    We are having problems with 2013 workflows because the Distributed Cache keeps growing. At a point, it will crash and 2013 workflows will not run due to a http 401 Unauthorized error. I think that Workflow Manager tries to grab the user's token and the request times out due to the cache's size.  The Distributed Logon Token Cache and the SPVIewStateCache both time out.

    The ULS logs show:

    Unexpected error occurred in method 'GetObject' , usage 'Distributed Logon Token Cache' - Exception 'Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<ERRCA0018>:SubStatus<ES0001>:The request timed out.. Additional Information : The client was trying to communicate with the server : net.tcp://SP2013.Domain.local:22233    
     at Microsoft.ApplicationServer.Caching.DataCache.ThrowException(ResponseBody respBody, RequestBody reqBody)    
     at Microsoft.ApplicationServer.Caching.DataCache.InternalGet(String key, DataCacheItemVersion& version, String region, IMonitoringListener listener)    
     at Microsoft.ApplicationServer.Caching.DataCache.<>c__DisplayClass49.<Get>b__48()    
     at Microsoft.SharePoint.DistributedCaching.SPDistributedCache.GetObject(String key)'.



    Unexpected error occurred in method 'Put' , usage 'SPViewStateCache' - Exception 'Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<ERRCA0018>:SubStatus<ES0001>:The request timed out.. Additional Information : The client was trying to communicate with the server : net.tcp://SP2013Dev.DMC.local:22233    
     at Microsoft.ApplicationServer.Caching.DataCache.ThrowException(ResponseBody respBody, RequestBody reqBody)    
     at Microsoft.ApplicationServer.Caching.DataCache.InternalPut(String key, Object value, DataCacheItemVersion oldVersion, TimeSpan timeout, DataCacheTag[] tags, String region, IMonitoringListener listener)    
     at Microsoft.ApplicationServer.Caching.DataCache.<>c__DisplayClass25.<Put>b__24()    
     at Microsoft.ApplicationServer.Caching.DataCache.Put(String key, Object value, TimeSpan timeout)    
     at Microsoft.SharePoint.DistributedCaching.SPDistributedCache.Put(String key, Object value)'.

    I've rebuilt the Workflow Manager farm but the problem keeps coming back. After a couple days of troubleshooting and rebuilding workflow manager, service bus, distributed cache, and UPS while trying to locate the problem, I was able to get workflows running again but it will break again once the cache grows too big.  



    I set the limit at 1024 MB with this Powershell command but it's not working.

    Update-SPDistributedCacheSize -CacheSizeInMB 1024 MB



     PS C:\Windows\system32> Get-AFCacheHostConfiguration -ComputerName SP2013 -CachePort "22233"


    HostName        : SP2013.Domain.local
    ClusterPort     : 22234
    CachePort       : 22233
    ArbitrationPort : 22235
    ReplicationPort : 22236
    Size            : 1024 MB
    ServiceName     : AppFabricCachingService
    HighWatermark   : 99%
    LowWatermark    : 90%
    IsLeadHost      : True


    The cache just keeps growing. What can I do to have Distributed Cache delete expired data?

    Command ran at 3pm

    PS C:\Windows\system32> Get-CacheStatistics -ComputerName sp2013  -CachePort 22233


    Size            : 3397632
    ItemCount       : 198
    RegionCount     : 304
    NamedCacheCount : 11
    RequestCount    : 2530
    MissCount       : 562



    At 4:30PM

    PS C:\Windows\system32> Get-CacheStatistics -ComputerName sp2013 -CachePort 22233


    Size            : 8997888
    ItemCount       : 260
    RegionCount     : 607
    NamedCacheCount : 11
    RequestCount    : 9372
    MissCount       : 1242



    At 6PM

    PS C:\Windows\system32> Get-CacheStatistics -ComputerName sp2013 -CachePort 22233


    Size            : 9325568
    ItemCount       : 222
    RegionCount     : 629
    NamedCacheCount : 11
    RequestCount    : 10224
    MissCount       : 1292



    Here are my Cache settings for the Distributed Logon Token Cache and the  SPVIewStateCache.



    PS C:\Windows\system32> Get-CacheConfig -CacheName DistributedViewStateCache_a080f929-f0d1-42cd-a9c1-14cc3ae717c3


    CacheName                : DistributedViewStateCache_a080f929-f0d1-42cd-a9c1-14cc3ae717c3
    TimeToLive               : 10 mins
    CacheType                : Partitioned
    Secondaries              : 0
    MinSecondaries           : 0
    IsExpirable              : True
    EvictionType             : LRU
    NotificationsEnabled     : False
    WriteBehindEnabled       : False
    WriteBehindInterval      : 300
    WriteBehindRetryInterval : 60
    WriteBehindRetryCount    : -1
    ReadThroughEnabled       : False
    ProviderType             : 
    ProviderSettings         : {}





    PS C:\Windows\system32> Get-CacheConfig -CacheName DistributedLogonTokenCache_a080f929-f0d1-42cd-a9c1-14cc3ae717c3 


    CacheName                : DistributedLogonTokenCache_a080f929-f0d1-42cd-a9c1-14cc3ae717c3
    TimeToLive               : 10 mins
    CacheType                : Partitioned
    Secondaries              : 0
    MinSecondaries           : 0
    IsExpirable              : True
    EvictionType             : LRU
    NotificationsEnabled     : False
    WriteBehindEnabled       : False
    WriteBehindInterval      : 300
    WriteBehindRetryInterval : 60
    WriteBehindRetryCount    : -1
    ReadThroughEnabled       : False
    ProviderType             : 
    ProviderSettings         : {}


    Thursday, September 05, 2013 11:50 PM

All replies

  • Hi Jerry,

    If  my understanding is correct, you got some exceptions when you used Windows AppFabric Cache.

    For resolving your issue, you can try to configure AppFabric client config:

     <dataCacheClient requestTimeout="15000" channelOpenTimeout="3000" maxConnectionsToServer="100"…

    when using the cache on http channel for example in Azure Cache it is required to configure ServicePointManager as well. so In each client make sure this is called on start:

      ServicePointManager.UseNagleAlgorithm = false;
      ServicePointManager.Expect100Continue = false;
      ServicePointManager.SetTcpKeepAlive(false);
      ServicePointManager.DefaultConnectionLimit = 1000;

    Here is a similar article, it might help you.

    http://blogs.microsoft.co.il/blogs/applisec/archive/2012/08/02/explain-timeouts-on-windows-appfabric-cache.aspx

    I hope this helps.

    Thanks


    Wendy Li
    TechNet Community Support

    Monday, September 09, 2013 10:47 AM
    Moderator
  • Thanks for the info.  Do you know where the configuration file is located? I tried to find it last week but without success.

    Thanks

    Monday, September 09, 2013 2:43 PM
  • did you ever figure this out.  I'm in the same boat...more then 50 connections to distributed cache and lots of exceptions.  Cache look healthy so I want to up the timout numbers from sharepoint to distributed cache.
    Sunday, October 13, 2013 9:32 PM
  • You can increase the Cache timeout by running these commands.

    $LogonTokenCache = Get-SPDistributedCacheClientSetting -ContainerType DistributedLogonTokenCache

    $LogonTokenCache.RequestTimeout = 300

    Set-SPDistributedCacheClientSetting -ContainerType DistributedLogonTokenCache -DistributedCacheClientSettings $LogonTokenCache

    $ViewStateCache = Get-SPDistributedCacheClientSetting -ContainerType DistributedViewStateCache

    $ViewStateCache.RequestTimeout = 300

    Set-SPDistributedCacheClientSetting -ContainerType DistributedViewStateCache -DistributedCacheClientSettings $ViewStateCache


    • Edited by Jerry Choinski Tuesday, October 29, 2013 9:06 PM formatting
    Tuesday, October 29, 2013 9:05 PM
  • I've been experiencing similar issues. This may help you. SharePoint 2013 distributed cache bug

    Check your AppFabric 1.1 version and see if it includes updates 3 & 4 noted in the article.

    • Edited by Bill Burke Wednesday, February 26, 2014 2:24 PM
    Wednesday, February 26, 2014 2:21 PM