I have a Web Role (West Europe), which is using the new Private Cache (or what is the name of it) and has 2 instances. Caches are taking up each instance's 30% of memory (default setting).
Today our service has been crashed, and went out for good for 18 minutes (data from Pingdom).
I can see in the WAD Event Log the following:
Application: CacheService.exe Framework Version: v4.0.30319 Description: The process was terminated due to an unhandled exception. Exception Info: Microsoft.ApplicationServer.Caching.DataCacheException Stack: at Microsoft.Fabric.Common.IOCompletionPortWorkQueue.Invoke(System.Threading.WaitCallback, System.Object) at Microsoft.Fabric.Common.IOCompletionPortWorkQueue.WorkerThreadStart() at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) at System.Threading.ThreadHelper.ThreadStart()
It was logged on Instance 1 on 2012.07.31 12:49:01, and on Instance 2 on 2012.07.31 12:29:03. I could see many ErrorCode<ERRCA0017>:SubStatus<ES0002> errors in the app's trace log between this time range.
Eventually all instances restarted automatically and the service is up again.
I'm trying to connect to the instances with RDP to see AppFabric logs, but I cannot. It asks for login, thinking a bit, and shows a cannot connect dialog.
Could anyone enlight me how can this happened? As fas as I know Private Cache should be a HA solution, right? (I've configured cache backup.)
According to your problem description looks like, both the instances of caching service crashed. If you are able to connect to instances currently, can you check the crash event in "Microsoft->Windows-Application Server-System Services/Admin" channel, this would really help narrow down the problem.
Ok, today I was able to connect with the instances, and here is what I've found:
- 已编辑 unbornchikken 2012年8月1日 7:09
Looked at errors, looks like you have not configured the correct storage account for 'ConfigStoreConnectionString' property. Can you please check that?
Please hit "Yes", if my post answered your question(s). All postings are as-is and confer no rights.
Of course it is configured correctly. The service had been running for days before the crash happened without no apparent reason.
I did configure caching with VS, and I can see that Microsoft.WindowsAzure.Plugins.Caching.ConfigStoreConnectionString setting is only present in cscfg but not in csdef. Is it correct that way?
Maybe the storage was not accessible on that time, like it was happenning on last week: http://social.msdn.microsoft.com/Forums/en-US/windowsazuredata/thread/cb715eb2-a6de-45f9-8f08-d306f5332681
Could this be the source of this problem? Maybe the cache client code can be more defensive than the current implementation, and when the storage (and the config) is not available it should use the last known config instead of crashing.