We are analyzing an issue related eviction happening on an AppFabric Cache Cluster that’s running on 4 web servers. The application using the cache cluster is an ASP .NET 3.5 web application which is configured to use the AppFabric session state provider, so all the user sessions are maintained in-memory. The name cache used for containing this session data was created using the following command:
PS command- new-Cache -CacheName SessionData
We had enabled operational logging on the cache host to troubleshoot the issue and we see the following messages logged in the event log. ( See Error logged in operation log.PNG )
After reading thru the links mentioned below we are bit confused as to what might be happening here.
Eviction troubleshooting: http://msdn.microsoft.com/en-us/library/ff921021.aspx
Throttling troubleshooting: http://msdn.microsoft.com/en-us/library/ff921030.aspx
Refer the task manager screenshots:
1. The cache has only consumed around 150 megs of RAM, and the web application around 400 megs. (See Memory usage.PNG)
2. Overall available memory is around 800 megs. (See Memory usage.PNG)
I ran the Get-CacheHostConfig to check for the cache host configuration: (See Cache host configuration.PNG)
As you can see the configured size is 1 GB and the watermark settings are 90% and 70%. The servers had only 2 GB memory when the cluster was installed and hence the default size of 1 GB. We haven’t changed the configuration yet to increase the size to 2GB. But from the explanations given in the above links we are not sure as to what might be happening here, will reconfiguring the size solve the issue or is it low available RAM on the system causing the issue.
Above mentioned images can be downloaded from http://www.box.net/shared/kcm7csk9s6
Thanks for the question. The first question I have is whether you have one or two additional servers that you could use as dedicated cache servers that are separate from the web servers. We've seen issues with performance and stability when the caching service has been running on application/web servers. There are problems around contention for available memory, network utilization, etc. So if you have the machines, I would suggest going this route. I don't think you would see the same issues with dedicated servers.
With that said, I understand that you've started down this path and that additional servers may not be readily available. So I'd like to do my best to help if I can. First, in your screen shots of the memory, you're showing the memory "working set" columns. It's my understanding that paged memory may not be included in this working set columns. If some of your memory is paged to disk, then it will subtract that memory from these totals. The column that is best to use with the task manager is the "Commit Size". This correlates to "Private Bytes" in the performance monitor. So the first thing I would do is to add that column and see if it shows higher memory use than the columns you're looking at now.
When you're getting these errors, could you post the output of:
I know you say that you only have 150 MB of data in the cache. If that is the case, then you shouldn't be hitting the high watermark (90% of 1GB is 900 MB).
In looking at your event log screen shot (very helpful, BTW), I see it giving percentages for various pieces. I'm going to try to run a test to verify, but I think this is saying that you're using very little memory for your cache but that the system memory is at 19% available (< 200 MB).
Finally, I am wondering about the performance counters that indicate eviction. Can you look at those and prove that eviction is happening? Are you seeing errors in your web application that make you think eviction is happening as well?
I have a few other thoughts, but this post is getting long. Can you get back to me and we'll go from there? Thanks.