locked
Constant Cache misses for CDN endpoint RRS feed

  • Question

  • Hi. I've been testing the CDN endpoint to provide a delivery mechanism for application files. We had previously used File Storage but it wouldn't scale to what we needed so we are testing a move to a central blob storage with a CDN around it to get files closer to the end users that are distributed across the globe.

    Current setup is a single blob storage in West Europe with the files (close to us), and a single CDN endpoint address around it. CDN is a Premium Verizon profile. We also have a central web app that emits the file listing for end users to download (automatically done by an installed windows service on the client machine) and a couple of configuration endpoints on the webapp for the service to obtain the endpoint address and token.

    I've added two rules :

    • "enable Auth token" with IF ALWAYS -> Token Auth : enabled, Token auth Ignore URL case: enabled
    • "2 weeks of cache" with IF ALWAYS -> External Max-age : 2 weeks 

    Also:

    • Query string Caching is set to "standard-cache".
    • Blob files also have the Cache-control header set to "public, max-age=1209600" set when the file is uploaded during builds.

    To test this, I have used my local development machine (in Portugal) to download from the endpoint. First download was slow (as expected) due to the expected cache warm-up, but the second attempt was significantly faster (1.2 GB download took 20 min on a cold run and 4 min after warming up). CDN analytics showed consistent Cache Hits on the second attempt.

    I also set up a VM in Southeast Asia to test a "far away" scenario. First download took significantly longer (as expected - about 90min for the first download, accounting for the latency from EU to Asia) but on the second download attempt, only a couple of files appeared to have cache hits, and after a few minutes, 0 cache hits, meaning I was not getting the expected benefit of caching the files close to the end user.

    I'm wondering if :

    • Is this the expected behavior? My expectation was that the files would have been cached for 2 weeks (or at least some significant period) at the edge(s).
    • Are there limits to the amount of items that can be cached by the edge server? Our apps are pretty big (600M - 1.5G) and we expect first installs to be heavy and frequent, and therefore would like to leverage the CDN's caching capabilities to reduce latency on the file access across the globe. Hitting the central storage ended up being a very big problem for us, and therefore the decision to migrate to this scenario. Afterwards, we expect some access peaks when updates are made available (new cache entries) but some warm up strategy on the caches that we could setup could make things easier on the clients. Also we consider the files to be immutable so there is no expectation of needing to purge the caches and the header's info should be enough to decide how long to preserve the file.
    • Are there rules on the edge that affects the caching decision, such as number of downloads (only cache after a number of downloads?) or frequency?
    • Does the cache-header information dictate how the file is cached at the edge, or are other rules in play?
    • Is there some other analytics tool I can use to debug behavior?
    Monday, October 30, 2017 11:25 AM

Answers

  • The CDN should honor the 'Cache-Control:max-age" setting for cache by default, unless you tell it otherwise with a rule. Force Internal Max-Age just overrides what your header says for CDN cache duration. External Max-Age doesn't affect how the CDN caches, only what headers it sends to the browser to cache.

    I don't see why your SEAsia test would result in so many misses unless 1. your blob doesn't have cache headers set correct, 2. You have a misconfigured rule telling the CDN to not cache.

    Can you provide the full request and response headers? I'd like to help debug as it is not expected behavior.

    You can debug yourself also this way:

    1. In rules engine, turn on the feature "user-debug-request-header" for always.

    2. In your request header, add X-EC-Debug: x-ec-cache-state

    3. In the response, look for something like the following. The highlighted shows the cache duration on the CDN.

    X-EC-Debug: x-ec-cache-state: max-age=604800 (7d); cache-ts=1341802519 (Mon, 09 Jul 2012 02:55:19 GMT); cache-age=0 (0s); remaining-ttl=604800 (7d); expires-delta=none

    • Marked as answer by Miguel Alho Tuesday, November 21, 2017 3:18 PM
    Monday, November 6, 2017 10:26 PM

All replies

  • Just to add a bit more info:  I created a VM in West US 2, and performing the same test, I get constant cache misses, right after the initial cold download finished. 
    Monday, October 30, 2017 3:19 PM
  • I decided to stop the second run and restart it. This time I get a mix of hits (1) and misses (2):

    Monday, October 30, 2017 3:41 PM
  • We may have figured it out, for now: We added a "Force Internal Max-Age" rule in the CDN's rule set and it seems to provide consistent cache hits. The file's Cache-Control and External Max-age do not seem top be enough to control the caching. Is this correct? Or is just coincidence?
    Monday, October 30, 2017 6:08 PM
  • The CDN should honor the 'Cache-Control:max-age" setting for cache by default, unless you tell it otherwise with a rule. Force Internal Max-Age just overrides what your header says for CDN cache duration. External Max-Age doesn't affect how the CDN caches, only what headers it sends to the browser to cache.

    I don't see why your SEAsia test would result in so many misses unless 1. your blob doesn't have cache headers set correct, 2. You have a misconfigured rule telling the CDN to not cache.

    Can you provide the full request and response headers? I'd like to help debug as it is not expected behavior.

    You can debug yourself also this way:

    1. In rules engine, turn on the feature "user-debug-request-header" for always.

    2. In your request header, add X-EC-Debug: x-ec-cache-state

    3. In the response, look for something like the following. The highlighted shows the cache duration on the CDN.

    X-EC-Debug: x-ec-cache-state: max-age=604800 (7d); cache-ts=1341802519 (Mon, 09 Jul 2012 02:55:19 GMT); cache-age=0 (0s); remaining-ttl=604800 (7d); expires-delta=none

    • Marked as answer by Miguel Alho Tuesday, November 21, 2017 3:18 PM
    Monday, November 6, 2017 10:26 PM
  • Thanks, Richard. I, unfortunately, did not get a notification for your response, but I came back to this since I still have the problem.

    I've added the rule to verify. Without it, and reading the headers from requests to 2 different files in different accounts (with the same setup - staging and prod) I get:


    Local attempt:

    Dev headers:
            X-Cache : [HIT]
            x-ms-blob-type : [BlockBlob]
            x-ms-lease-status : [unlocked]
            x-ms-request-id : [bd373bcc-001e-0010-1847-5abde6000000]
            x-ms-version : [2009-09-19]
            Accept-Ranges : [bytes]
            Cache-Control : [max-age=1209600]
            Date : [Fri, 10 Nov 2017 17:20:40 GMT]
            ETag : [0x8D51D2019BD7530]
            Server : [ECAcc,(toj/896E)]
    Prod headers:
            X-Cache : [HIT]
            x-ms-blob-type : [BlockBlob]
            x-ms-lease-status : [unlocked]
            x-ms-request-id : [f34a879a-001e-00da-6247-5afcef000000]
            x-ms-version : [2009-09-19]
            Accept-Ranges : [bytes]
            Cache-Control : [max-age=1209600]
            Date : [Fri, 10 Nov 2017 17:20:40 GMT]
            ETag : [0x8D5260E268FA0ED]
            Server : [ECAcc,(toj/8961)]

    It seems to be returning the Cache-Control info (max-age 2 weeks as i have setup). 

    SEAsia (1st try):

    Dev headers:
            X-Cache : [HIT]
            x-ms-blob-type : [BlockBlob]
            x-ms-lease-status : [unlocked]
            x-ms-request-id : [ea58c0a4-001e-004a-24ae-51bb67000000]
            x-ms-version : [2009-09-19]
            Accept-Ranges : [bytes]
            Cache-Control : [max-age=1209600]
            Date : [Fri, 10 Nov 2017 17:25:03 GMT]
            ETag : [0x8D51D2019BD7530]
            Server : [ECAcc,(sin/BFA5)]
    Prod headers:
            x-ms-blob-type : [BlockBlob]
            x-ms-lease-status : [unlocked]
            x-ms-request-id : [1af1e4dd-001e-0073-6148-5a2907000000]
            x-ms-version : [2009-09-19]
            Cache-Control : [max-age=1209600]
            Date : [Fri, 10 Nov 2017 17:25:03 GMT]
            ETag : [0x8D5260E268FA0ED]
            Server : [Windows-Azure-Blob/1.0,Microsoft-HTTPAPI/2.0]

    2nd-try:

    Dev headers:
            X-Cache : [HIT]
            x-ms-blob-type : [BlockBlob]
            x-ms-lease-status : [unlocked]
            x-ms-request-id : [ea58c0a4-001e-004a-24ae-51bb67000000]
            x-ms-version : [2009-09-19]
            Accept-Ranges : [bytes]
            Cache-Control : [max-age=1209600]
            Date : [Fri, 10 Nov 2017 17:26:23 GMT]
            ETag : [0x8D51D2019BD7530]
            Server : [ECAcc,(sin/BFA5)]
    Prod headers:
            x-ms-blob-type : [BlockBlob]
            x-ms-lease-status : [unlocked]
            x-ms-request-id : [09ccce2a-001e-002a-1049-5a2c81000000]
            x-ms-version : [2009-09-19]
            Cache-Control : [max-age=1209600]
            Date : [Fri, 10 Nov 2017 17:26:23 GMT]
            ETag : [0x8D5260E268FA0ED]
            Server : [Windows-Azure-Blob/1.0,Microsoft-HTTPAPI/2.0]


    WestUS

    1st try:

    Dev headers:
            x-ms-blob-type : [BlockBlob]
            x-ms-lease-status : [unlocked]
            x-ms-request-id : [68d57a39-001e-0040-5b49-5aa2ee000000]
            x-ms-version : [2009-09-19]
            Cache-Control : [max-age=1209600]
            Date : [Fri, 10 Nov 2017 17:28:30 GMT]
            ETag : [0x8D51D2019BD7530]
            Server : [Windows-Azure-Blob/1.0,Microsoft-HTTPAPI/2.0]
    Prod headers:
            x-ms-blob-type : [BlockBlob]
            x-ms-lease-status : [unlocked]
            x-ms-request-id : [0e868535-001e-0059-3c49-5a5c42000000]
            x-ms-version : [2009-09-19]
            Cache-Control : [max-age=1209600]
            Date : [Fri, 10 Nov 2017 17:28:32 GMT]
            ETag : [0x8D5260E268FA0ED]
            Server : [Windows-Azure-Blob/1.0,Microsoft-HTTPAPI/2.0]

    2nd try:

    Dev headers:
            x-ms-blob-type : [BlockBlob]
            x-ms-lease-status : [unlocked]
            x-ms-request-id : [a193060d-001e-0007-2449-5a7d85000000]
            x-ms-version : [2009-09-19]
            Cache-Control : [max-age=1209600]
            Date : [Fri, 10 Nov 2017 17:29:15 GMT]
            ETag : [0x8D51D2019BD7530]
            Server : [Windows-Azure-Blob/1.0,Microsoft-HTTPAPI/2.0]
    Prod headers:
            X-Cache : [HIT]
            x-ms-blob-type : [BlockBlob]
            x-ms-lease-status : [unlocked]
            x-ms-request-id : [0e868535-001e-0059-3c49-5a5c42000000]
            x-ms-version : [2009-09-19]
            Accept-Ranges : [bytes]
            Cache-Control : [max-age=1209600]
            Date : [Fri, 10 Nov 2017 17:29:16 GMT]
            ETag : [0x8D5260E268FA0ED]
            Server : [ECAcc,(sec/96AC)]

    (this is, once again, without the debug rule active yet)

    Friday, November 10, 2017 5:30 PM
  • Thanks for the details. This behavior is unexpected, so lets try to get to the bottom of it.

    1. Try requesting the SEAsia prod instance 3 times in succession. It can take up to the 3rd request for it to be returned from cache.

    2. Try purging your content, and trying the test again.

    3. Please share the request headers used as well.

    4. Please share the entire set of rules you have enabled.

    Friday, November 10, 2017 8:14 PM
  • (a few days later)

    With the debug headers active (single request):

    USWEST: 

    Dev headers:
            X-Cache : [HIT]
            X-EC-Cache-State : [max-age=1209600 (14d); cache-ts=1510334956 (Fri, 10 Nov 2017 17:29:16 GMT); cache-age=233143 (2d 16h 45m 43s); remaining-ttl=976457 (11d 7h 14m 17s); expires-delta=none]
            x-ms-blob-type : [BlockBlob]
            x-ms-lease-status : [unlocked]
            x-ms-request-id : [a193060d-001e-0007-2449-5a7d85000000]
            x-ms-version : [2009-09-19]
            Accept-Ranges : [bytes]
            Cache-Control : [max-age=1209600]
            Date : [Mon, 13 Nov 2017 10:14:59 GMT]
            ETag : [0x8D51D2019BD7530]
            Server : [ECAcc,(sec/9789)]
    Prod headers:
            X-Cache : [HIT]
            X-EC-Cache-State : [max-age=1209600 (14d); cache-ts=1510334913 (Fri, 10 Nov 2017 17:28:33 GMT); cache-age=233186 (2d 16h 46m 26s); remaining-ttl=976414 (11d 7h 13m 34s); expires-delta=none]
            x-ms-blob-type : [BlockBlob]
            x-ms-lease-status : [unlocked]
            x-ms-request-id : [0e868535-001e-0059-3c49-5a5c42000000]
            x-ms-version : [2009-09-19]
            Accept-Ranges : [bytes]
            Cache-Control : [max-age=1209600]
            Date : [Mon, 13 Nov 2017 10:14:59 GMT]
            ETag : [0x8D5260E268FA0ED]
            Server : [ECAcc,(sec/96AC)]

    SEAsia:

    Dev headers:
            X-Cache : [HIT]
            X-EC-Cache-State : [max-age=1209600 (14d); cache-ts=1509388934 (Mon, 30 Oct 2017 18:42:14 GMT); cache-age=1179305 (13d 15h 35m 5s); remaining-ttl=30295 (8h 24m 55s); expires-delta=none]
            x-ms-blob-type : [BlockBlob]
            x-ms-lease-status : [unlocked]
            x-ms-request-id : [ea58c0a4-001e-004a-24ae-51bb67000000]
            x-ms-version : [2009-09-19]
            Accept-Ranges : [bytes]
            Cache-Control : [max-age=1209600]
            Date : [Mon, 13 Nov 2017 10:17:19 GMT]
            ETag : [0x8D51D2019BD7530]
            Server : [ECAcc,(sin/BFA5)]
    Prod headers:
            X-Cache : [HIT]
            X-EC-Cache-State : [max-age=1209600 (14d); cache-ts=1510334783 (Fri, 10 Nov 2017 17:26:23 GMT); cache-age=233456 (2d 16h 50m 56s); remaining-ttl=976144 (11d 7h 9m 4s); expires-delta=none]
            x-ms-blob-type : [BlockBlob]
            x-ms-lease-status : [unlocked]
            x-ms-request-id : [09ccce2a-001e-002a-1049-5a2c81000000]
            x-ms-version : [2009-09-19]
            Accept-Ranges : [bytes]
            Cache-Control : [max-age=1209600]
            Date : [Mon, 13 Nov 2017 10:17:19 GMT]
            ETag : [0x8D5260E268FA0ED]
            Server : [ECAcc,(sin/BF4A)]

    There seems to be info about the file being in cache (both know that the file has been chached for 2 days) which is good. 

    Rules-wise I have the following (based on the XML from the notification, and cleaned up a bit ):

    <rules >

      <rule ...>

    <description>Enable Token auth</description>

        <!--If-->
        <match.always>
          <feature.token-auth value="true" />
          <feature.token-ignore-url-case value="true" />
        </match.always>
      </rule>
      <rule ...>
        <description>2 weeks of cache</description>
        <!--If-->
        <match.always>
          <feature.external-ttl value="2" units="weeks" />
        </match.always>
      </rule>
      <rule ...>
        <description>Force Internal Max Age</description>
        <!--If-->
        <match.always>
          <feature.force-internal-maxage status="200" value="1209600" units="seconds" />
        </match.always>
      </rule>
      <rule ...>
        <description>Debug cache</description>
        <!--If-->
        <match.always>
          <feature.user-debug-request-header value="true" />
        </match.always>
      </rule>
    </rules>


    • Edited by Miguel Alho Monday, November 13, 2017 2:19 PM removed html from grammarly in text
    Monday, November 13, 2017 11:15 AM
  • After warming up the cache a few times, I can still see multiple misses. US instances seem to be caching correctly but SEAsia not so much. In one of the sets, we want to cache has about 10K+ files, and that group suffers a lot due to the number of misses. Once it gets to that set, we get all misses:

    I currently don't have anything in place to handle the response headers for all the files requested (though that could be a nice piece of middleware to add to the HTTP request pipeline, even if to activate only in a debug session and analyze results).

    This brings up a few questions on my side, that I need to consider to move forward:

    • This there any type of rules on the CDN servers that override TTL info? I can imagine that some limits may be in place (max file count, disk quota, upper bound for TTL) that could affect the caching. Could the behavior I see be related to this? Are these possible limits published anywhere?
    • We have plans to support some bundling of the files to reduce the number of items to download, but it won't be implemented for some time due to priorities. Are there any patterns you'd recommend for CDN usage?
    • Are there any recommendations related to monitoring practices to include in the app, such as reading/analyzing the debug cache headers? Is it a viable option to leave the rule on in production?

    Thanks!

    Monday, November 13, 2017 3:19 PM
  • Relative to Headers, currently only sending the X-EC-Debug: x-ec-cache-state header you recommended. All others are C# s HttpClient default headers.

    I introduced the Request handler to track the header values on the response. I've added the generated file (after clearing the URLs). The first half of the sequence has successive hits, but halfway through it begins to fail. A simple count for X-Cache: "#NA#" (header not part of response) gives me about 30% misses.

    https://www.dropbox.com/s/u46kk92ndvz6myp/cleanCacheDiagnostic.txt?dl=0

    Thanks!


    • Edited by Miguel Alho Tuesday, November 14, 2017 11:09 AM
    Tuesday, November 14, 2017 11:08 AM
  • Hi Miguel,

    There isn't any error with your configuration, but I reached the point that I need assistance from Verizon support engineers to dive deeper. Please email me at rli@microsoft.com so that I can open up a case with you and Verizon support.

    Regarding your questions

    1. CDN servers have built in algorithms to evict caches when they are under high load, but this doesn't seem to be one of those cases. We need to work with support to dig into logs.

    2. There is a known issue with caching and bundling. See here for a workaround. https://stackoverflow.com/questions/12047981/how-to-upload-bundled-and-minified-files-to-windows-azure-cdn

    3. Yes you can leave the rule on in production.

    Wednesday, November 15, 2017 1:58 AM
  • Thanks, Richard. Before moving to support, I want to explore one more route that you're response header logging suggestion help get at.

    For some reason - most likely a configuration error on our services and poor code on our client - may have brought forward a root cause for the misses - one we probably would not find without the header suggestion. We seem to be getting 404s from the edge, and digging deeper, the cause is a "Blob not found <g class="gr_ gr_339 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Grammar only-ins replaceWithoutSep" data-gr-id="339" id="339">error</g>". We were not handling the 404 correctly on the client so we were still downloading "a file" - but just with a serialized error message content with the response message I just mentioned. I decided to log the response code with the X-Cache response header, and some misses were in fact related to the 404.

    We've resynced the data source Db and storage, and I'm going to run some extra tests to see if the problem has disappeared. I'd suggest adding to your initial response <g class="gr_ gr_811 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Grammar multiReplace" data-gr-id="811" id="811">a 4th</g> step - log the response status code along with the cache headers to correlate them.

    Anyway, I'm going t test a bit further and come to a conclusion, and hopefully close this issue!

    Once again, thanks!

    Wednesday, November 15, 2017 4:32 PM
  • Hi. I-ve closed up the analysis on this. The X-EC-Debug: x-ec-cache-state debug header is really helpful here. Using it and logging it, we were able to find a couple of issues that our client code was not handling correctly, namely 404s and what I believe to be timeouts related to large, uncached files and the latency associated to it. Logging this as you suggested really helped.

    Thanks!

    Tuesday, November 21, 2017 3:22 PM