none
Cache starts behaving oddly after 3 days

    คำถาม

  • This is really hard to discribe.  We have 3 caching servers, 4 gigs per box, in a HA config (one secondary per cache). There are in total 4 web servers using the cache.  We have several named caches, and when the issues start happening, it hits them all.  Local cache is enabled.

    We moved one of our websites to AppFabric Caching about a month ago, and it worked great, no issues.  We recently added another application (another website) and now about every 48-72 hours - the caching just starts acting odd.

    By odd, I mean - tags attached to items seem to dissappear, while put statements seem to work (no exceptions) they are not reflected in future gets, session state stops working ( I assume again because the puts seem to do nothing).  However, Gets seem to work (sometimes with tags).  Clearing regions does not seem to work.  The client does not throw any exceptions either, we get an occational exception about an enumration changing (we have some notification callbacks setup) so that is expected.  Our notification poll interval is set to 30 seconds.

    So far the fix is to restart-cachecluster. Then it is all happy again for a few days.  While the issue is happening, the caching servers appear to be healthy, issuing commands via powershell returns what looks like normal results.

    I really have no idea where to start looking, any ideas?

    5 กุมภาพันธ์ 2555 17:19

ตอบทั้งหมด

  • 1. Which version of cache ?

    2. What you are describing sounds like eviction (except the tags being removed from objects, need to verify that one) :

    http://msdn.microsoft.com/en-us/library/ff637725.aspx

    Can you monitor the perf counters and verify ?

     

     

    5 กุมภาพันธ์ 2555 21:02
  • Hello, I think I am running the latest, I don't exactly know how to get you the running version, I dont see a powershell command for it, but here is the file version: 1.0.2912.0

    I am recording every cache performance counter.  Anything I should be looking for?  It will most likely happen again tomorrow or the day after.

    6 กุมภาพันธ์ 2555 17:49
  • Memory pressure and eviction can cause what you are experiencing, please have a look at this MSDN page about diagnosing and troubleshooting these issues.

    http://msdn.microsoft.com/en-us/library/ee790981.aspx

    6 กุมภาพันธ์ 2555 19:21
  • Are you using tags on short lived objects ?

    We've seen memory problems with this scenario as the tags hang around after the objects have been removed from the cache...

    http://social.msdn.microsoft.com/Forums/en-US/velocity/thread/8a68d69b-6ffe-4006-bc0f-feef24b8a5de

    9 กุมภาพันธ์ 2555 15:15
  • I have been trying to capture the logs, however the recording seems to stop randomly.

    Anyway, it has happend 3 more times since I made this post, one was this morning, before I fixed it I looked at the memory usage on each machine:

    Cache01 1.84 out of 3gb (Lead host)
    Cache02 1.97 out of 3gb (Lead host)
    Cache03 2.65 out of 3gb

    In this case all of the tags were present on objects, the regions were in place, I could get objects by tag, but I could not get objects by key.  Nor could we add/put objects into the cache until it was restarted.

    So, Cache03 was basically maxed, could that be causing the issues? If so, why did it not load balance properly to use up other space?  All of our caches have 1 secondary defined as well.

    15 กุมภาพันธ์ 2555 17:01
  • I have been trying to capture the logs, however the recording seems to stop randomly.

    Anyway, it has happend 3 more times since I made this post, one was this morning, before I fixed it I looked at the memory usage on each machine:

    Cache01 1.84 out of 3gb (Lead host)
    Cache02 1.97 out of 3gb (Lead host)
    Cache03 2.65 out of 3gb

    In this case all of the tags were present on objects, the regions were in place, I could get objects by tag, but I could not get objects by key.  Nor could we add/put objects into the cache until it was restarted.

    So, Cache03 was basically maxed, could that be causing the issues? If so, why did it not load balance properly to use up other space?  All of our caches have 1 secondary defined as well.

    The data is stored in partitions which should be load balanced by our monitoring layer, are you using one-two regions very heavily ? Regions are stored on one cache host only and this can lead to such issues.

    17 กุมภาพันธ์ 2555 8:32