locked
AppFabric 1.1 Cache - Problem with Start-CacheCluster

    Question

  • I have the following environment:

    8 Load-balanced Web servers running Windows Server 2008 R2 x64 Standard

    2 Clustered database servers running Windows Server 2008 R2 x64 Enterprise and Sql Server 2008 R2 Enterprise

    All servers are in a domain and in the same VLAN without any port restrictions between them.

    I am using a domain account that is local admin in all servers and running Power Shell as administrator.

    I installed, configured and tested AppFabric Cache in the first server and it was OK.

    The cache cluster configuration provider is Sql Server.

    Then I installed and configured the second server to use AppFabric Cache and tried to start the cache cluster with Start-CacheCluster, but after 5 minutes the following entries appears in Event Viewer in both servers:

    Log Name:      Application
    Source:        .NET Runtime
    Date:          10/05/2012 18:54:51
    Event ID:      1026
    Task Category: None
    Level:         Error
    Keywords:      Classic
    User:          N/A
    Computer:      XXXXXXXXXXX
    Description:
    Application: DistributedCacheService.exe
    Framework Version: v4.0.30319
    Description: The process was terminated due to an unhandled exception.
    Exception Info: Microsoft.ApplicationServer.Caching.DataCacheException
    Stack:
       at Microsoft.ApplicationServer.Caching.VelocityWindowsService.StartServiceCallback(System.Object)
       at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
       at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
       at System.Threading.ThreadPoolWorkQueue.Dispatch()
       at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()

    and

    Log Name:      Application
    Source:        Application Error
    Date:          10/05/2012 18:54:52
    Event ID:      1000
    Task Category: (100)
    Level:         Error
    Keywords:      Classic
    User:          N/A
    Computer:      XXXXXXXXXXXX
    Description:
    Faulting application name: DistributedCacheService.exe, version: 1.0.4632.0, time stamp: 0x4eafeccf
    Faulting module name: KERNELBASE.dll, version: 6.1.7601.17651, time stamp: 0x4e21213c
    Exception code: 0xe0434352
    Fault offset: 0x000000000000cacd
    Faulting process id: 0x9f60
    Faulting application start time: 0x01cd2ef65b24cc93
    Faulting application path: C:\Program Files\AppFabric 1.1 for Windows Server\DistributedCacheService.exe
    Faulting module path: C:\Windows\system32\KERNELBASE.dll
    Report Id: c4cc0d1b-9aea-11e1-b7db-d4bed9b1a0bd

    Can someone help me on this? I have searched the web a LOT and found nothing conclusive, I'm stuck on this for hours.

    Thank you.

    EDIT:

    Here is the output when I attempt to start the cluster:

    PS C:\Windows\system32> Start-CacheCluster
    Start-CacheCluster : ErrorCode<ERRCAdmin003>:SubStatus<ES0001>:Time-out occurred on net.tcp://host2:22233.
    At line:1 char:19
    + Start-CacheCluster <<<<
        + CategoryInfo          : NotSpecified: (:) [Start-CacheCluster], DataCacheException
        + FullyQualifiedErrorId : ERRCAdmin003,Microsoft.ApplicationServer.Caching.Commands.StartCacheClusterCommand
    
    
    HostName : CachePort Service Name            Service Status Version Info
    -------------------- ------------            -------------- ------------
    host1:22233    AppFabricCachingService UP             3 [3,3][1,3]
    host2:22233    AppFabricCachingService STARTING       3 [3,3][1,3]

    After 5 minutes, here is the output of Get-CacheHost

    PS C:\Windows\system32> Get-CacheHost
    
    HostName : CachePort Service Name            Service Status Version Info
    -------------------- ------------            -------------- ------------
    host1:22233    AppFabricCachingService UP             3 [3,3][1,3]
    host2:22233    AppFabricCachingService DOWN           3 [3,3][1,3]

    Also, here is my cluster configuration exported through "Export-CacheClusterConfig -File c:\temp\clusterconfig.xml"

    <?xml version="1.0" encoding="utf-8"?>
    <configuration>
        <configSections>
            <section name="dataCache" type="Microsoft.ApplicationServer.Caching.DataCacheSection, Microsoft.ApplicationServer.Caching.Core, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" />
        </configSections>
        <dataCache size="Medium">
            <caches partitionCount="256">
                <cache consistency="StrongConsistency" name="default" minSecondaries="0">
                    <policy>
                        <eviction type="Lru" />
                        <expiration defaultTTL="10" isExpirable="true" />
                    </policy>
                </cache>
            </caches>
            <hosts>
                <host replicationPort="22236" arbitrationPort="22235" clusterPort="22234"
                    hostId="226475190" size="8185" leadHost="true" account="Domain\User"
                    cacheHostName="AppFabricCachingService" name="host1"
                    cachePort="22233" />
    			<host replicationPort="22236" arbitrationPort="22235" clusterPort="22234"
                    hostId="2055987555" size="8185" leadHost="false" account="Domain\User"
                    cacheHostName="AppFabricCachingService" name="host2"
                    cachePort="22233" />	
            </hosts>
            <deploymentSettings>
                <deploymentMode value="RoutingClient" />
            </deploymentSettings>
        </dataCache>
    </configuration>



    • Edited by Pedro Lima Friday, May 11, 2012 4:54 PM
    Friday, May 11, 2012 12:57 PM

All replies

  • Hi Pedro,

    We would require to know the exact DataCacheException in order to help you. Can you check the events under the log name :"Microsoft-Windows-Application Server-System Services/Admin"?

    Also, please check if there could be any connectivity or firewall issues between the machines.

    Thanks,

    Bharath

    Friday, May 11, 2012 1:37 PM
  • Thanks for your response, Bharath, here is what you asked.

    Also, there are no port restrictions between the servers

    Log Name:      Microsoft-Windows-Application Server-System Services/Admin
    Source:        Microsoft-Windows Server AppFabric Caching
    Date:          11/05/2012 10:39:56
    Event ID:      111
    Task Category: (1)
    Level:         Error
    Keywords:      
    User:          NETWORK SERVICE
    Computer:     xxxxxxxxxxxxxxxx
    Description:
    AppFabric Caching service crashed with exception {Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<UnspecifiedErrorCode>:SubStatus<ES0001>:ErrorCode<ERRService0001>:SubStatus<ES0001>:Service initialization failed. No user action required. ---> Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<ERRService0001>:SubStatus<ES0001>:Service initialization failed. No user action required. ---> Microsoft.Fabric.Common.OperationCompletedException: Operation completed with an exception ---> System.TimeoutException: The operation has timed out.
       --- End of inner exception stack trace ---
       at Microsoft.Fabric.Common.OperationContext.End()
       at Microsoft.Fabric.Common.SharedCommunicationObject.EndOpen(IAsyncResult result)
       at Microsoft.Fabric.Federation.FederationSite.EndOpen(IAsyncResult result)
       at Microsoft.Fabric.Data.ReliableServiceManager.EndOpen(IAsyncResult ar)
       at Microsoft.ApplicationServer.Caching.DOMNode..ctor(Int32 id, String displayFriendlyNodeId, Int32 port, EndpointID[] urisDOM, ServiceConfigurationManager configurationManager, ReliableServiceProvider dataStore, ServiceResolverBase& client)
       --- End of inner exception stack trace ---
       at Microsoft.ApplicationServer.Caching.DOMNode..ctor(Int32 id, String displayFriendlyNodeId, Int32 port, EndpointID[] urisDOM, ServiceConfigurationManager configurationManager, ReliableServiceProvider dataStore, ServiceResolverBase& client)
       at Microsoft.ApplicationServer.Caching.DistributedObjectManager..ctor(EndpointID[] urisDOM, ServiceConfigurationManager configurationManager, WcfServerChannel channel)
       at Microsoft.ApplicationServer.Caching.DistributedObjectManager.GetInstance(EndpointID[] urisDOM, ServiceConfigurationManager configurationManager, WcfServerChannel channel)
       at Microsoft.ApplicationServer.Caching.ServiceLayer.ServiceStart(Boolean deleteTkt)
       at Microsoft.ApplicationServer.Caching.DataCacheServiceBase.ServiceStart(ServiceConfigurationManager scm, Boolean deleteTkt)
       at Microsoft.ApplicationServer.Caching.VelocityWindowsService.StartService(Boolean deleteTKT)
       at Microsoft.ApplicationServer.Caching.VelocityWindowsService.OnStart(String[] args)
       --- End of inner exception stack trace ---
       at Microsoft.ApplicationServer.Caching.VelocityWindowsService.ThrowCallback(Object exception)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean ignoreSyncCtx)
       at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
       at System.Threading.ThreadPoolWorkQueue.Dispatch()
       at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()}. Check debug log for more information

    I have edited the main post to provide more information.



    • Edited by Pedro Lima Friday, May 11, 2012 4:52 PM
    Friday, May 11, 2012 1:47 PM
  • We got the exact same exception

    9 Load-balanced Web servers running Windows Server 2008 R2 x64 Standard

    2 Clustered database servers running Windows Server 2008 R2 x64 Enterprise and Sql Server 2008 R2 Enterprise

    3 AppFabric Cache servers in a cluster running Windows Server 2008 R2 x64 Enterprise

    - 2 AppFabric server run fine.

    - 1 AppFabric server was running fine until last week.  Then he started to give the same exception that Pedro Lima reported.

    All servers are in a domain and in the same VLAN with port restrictions between them.

    I am using a domain account that is local admin in all servers and running Power Shell as administrator.

    The cache cluster configuration provider is Sql Server.


    PowerShell :

    PS C:\Windows\system32> start-cachehost APPFABRICSV3 22233 -Debug -Verbose
    
    HostName : CachePort                  Service Name            Service Status Version Info
    --------------------                  ------------            -------------- ------------
    
    APPFABRICSV3:22233                    AppFabricCachingService STARTING       3 [3,3][1,3]
    
    Start-CacheHost : Error occurred while performing the operation on host APPFABRICSV3:22233 : ErrorCode<ERRCA
    dmin003>:SubStatus<ES0001>:Time-out occurred on net.tcp://APPFABRICSV3:22233.
    At line:1 char:16
    + start-cachehost <<<<  APPFABRICSV3 22233 -Debug -Verbose
        + CategoryInfo          : NotSpecified: (:) [Start-CacheHost], DataCacheException
        + FullyQualifiedErrorId : ERRCAdmin003,Microsoft.ApplicationServer.Caching.Commands.StartCacheHostCommand
    
    	
    PS C:\Windows\system32> get-cachehost
    
    HostName : CachePort                  Service Name            Service Status Version Info
    --------------------                  ------------            -------------- ------------
    APPFABRICSV1:22233                    AppFabricCachingService UP             3 [3,3][1,3]
    APPFABRICSV2:22233                    AppFabricCachingService UP             3 [3,3][1,3]
    APPFABRICSV3:22233                    AppFabricCachingService UNKNOWN        3 [3,3][1,3]	
    	
    	
    PS C:\Windows\system32> get-cachehostconfig APPFABRICSV3 22233
    
    HostName        : APPFABRICSV3
    ClusterPort     : 22234
    CachePort       : 22233
    ArbitrationPort : 22235
    ReplicationPort : 22236
    Size            : 4095 MB
    ServiceName     : AppFabricCachingService
    HighWatermark   : 99%
    LowWatermark    : 90%
    IsLeadHost      : True	
    	

    PowerShell log

    Host APPFABRICSV1 is Reachable.,DistributedCache.CacheAdmin,Verbose,2012-6-7 11:13:44.439
    Command Start-CacheHost Parameters: APPFABRICSV3, 22233, -100:  Time=06/07/2012 15:15:07,DistributedCache.AdminPS,Verbose,2012-6-7 11:15:07.456
    Host APPFABRICSV3 is Reachable.,DistributedCache.CacheAdmin,Verbose,2012-6-7 11:15:07.487
    Host APPFABRICSV3 is Reachable.,DistributedCache.CacheAdmin,Verbose,2012-6-7 11:16:07.675

    EventLog Applications

    Faulting application name: DistributedCacheService.exe, version: 1.0.4632.0, time stamp: 0x4eafeccf
    Faulting module name: KERNELBASE.dll, version: 6.1.7601.17651, time stamp: 0x4e21213c
    Exception code: 0xe0434352
    Fault offset: 0x000000000000cacd
    Faulting process id: 0xfec
    Faulting application start time: 0x01cd44c1f9696d61
    Faulting application path: C:\Program Files\AppFabric 1.1 for Windows Server\DistributedCacheService.exe
    Faulting module path: C:\Windows\system32\KERNELBASE.dll
    Report Id: 59912a24-b0b6-11e1-b78c-0050569400cb

    EventLog Applications

    Application: DistributedCacheService.exe Framework Version: v4.0.30319 Description: The process was terminated due to an unhandled exception. Exception Info: Microsoft.ApplicationServer.Caching.DataCacheException Stack: at Microsoft.ApplicationServer.Caching.VelocityWindowsService.StartServiceCallback(System.Object) at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem() at System.Threading.ThreadPoolWorkQueue.Dispatch() at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()

    EventLog System


    The AppFabric Caching Service service terminated unexpectedly.  It has done this 6 time(s).

    EventLog Microsoft-Windows-Application


    AppFabric Caching service crashed with exception {Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<UnspecifiedErrorCode>:SubStatus<ES0001>:ErrorCode<ERRService0001>:SubStatus<ES0001>:Service initialization failed. No user action required. ---> Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<ERRService0001>:SubStatus<ES0001>:Service initialization failed. No user action required. ---> Microsoft.Fabric.Common.OperationCompletedException: Operation completed with an exception ---> System.TimeoutException: The operation has timed out.
       --- End of inner exception stack trace ---
       at Microsoft.Fabric.Common.OperationContext.End()
       at Microsoft.Fabric.Common.SharedCommunicationObject.EndOpen(IAsyncResult result)
       at Microsoft.Fabric.Federation.FederationSite.EndOpen(IAsyncResult result)
       at Microsoft.Fabric.Data.ReliableServiceManager.EndOpen(IAsyncResult ar)
       at Microsoft.ApplicationServer.Caching.DOMNode..ctor(Int32 id, String displayFriendlyNodeId, Int32 port, EndpointID[] urisDOM, ServiceConfigurationManager configurationManager, ReliableServiceProvider dataStore, ServiceResolverBase& client)
       --- End of inner exception stack trace ---
       at Microsoft.ApplicationServer.Caching.DOMNode..ctor(Int32 id, String displayFriendlyNodeId, Int32 port, EndpointID[] urisDOM, ServiceConfigurationManager configurationManager, ReliableServiceProvider dataStore, ServiceResolverBase& client)
       at Microsoft.ApplicationServer.Caching.DistributedObjectManager..ctor(EndpointID[] urisDOM, ServiceConfigurationManager configurationManager, WcfServerChannel channel)
       at Microsoft.ApplicationServer.Caching.DistributedObjectManager.GetInstance(EndpointID[] urisDOM, ServiceConfigurationManager configurationManager, WcfServerChannel channel)
       at Microsoft.ApplicationServer.Caching.ServiceLayer.ServiceStart(Boolean deleteTkt)
       at Microsoft.ApplicationServer.Caching.DataCacheServiceBase.ServiceStart(ServiceConfigurationManager scm, Boolean deleteTkt)
       at Microsoft.ApplicationServer.Caching.VelocityWindowsService.StartService(Boolean deleteTKT)
       at Microsoft.ApplicationServer.Caching.VelocityWindowsService.OnStart(String[] args)
       --- End of inner exception stack trace ---
       at Microsoft.ApplicationServer.Caching.VelocityWindowsService.ThrowCallback(Object exception)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean ignoreSyncCtx)
       at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
       at System.Threading.ThreadPoolWorkQueue.Dispatch()
       at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()}. Check debug log for more information





    Monday, June 11, 2012 6:17 PM
  • We fixed the issue by restoring a backup of our faulty server and by restarting the cluster.
    Tuesday, June 12, 2012 5:22 PM