locked
Prevent origin pull from caching pages (only content) RRS feed

  • Question

  • I set up my Azure CDN to pull from my custom origin domain. My site is an MVC 5 site and I'm using Azure CDN to pull assets (i.e. /Content, /Scripts). By default, web pages include the Cache-Control: private HTTP header--but Azure CDN will still serve requests to those pages.

    For example, my homepage is mywebsites.com. If I go to the root CDN: http://az1234.vo.msecnd.net/ then I see my homepage--I don't want that, I only want assets that have a cache header to be served, other requests should be 404 Not Found. Since I'm not serving the entire site over the CDN, this is what I want.

    Questions:

    1. Does Cache-Control: private prevent CDN from caching? If so, am I seeing pages on the CDN because it's just passing through? If that's true, how can I prevent that? A different cache header (no-cache?)?

    2. If Azure CDN doesn't support excluding pages, would setting up a URL rewrite rule that looks for the CDN URL and deny requests except for whitelisted paths work?

    Thanks.


    Kamran A

    Friday, October 16, 2015 5:37 PM

Answers

  • As per last response the host header that your origin will see will just be the name of your origin. It would not be the hostname of the CDN endpoint. So you can't use the host header to filter requests that come from the CDN. If the assets you need to serve from the CDN are from one or a set of virtual directories that just have content that would be served from the CDN, you can create one or more CDN endpoints to use for this purpose. For example, if all your content for the CDN is from YourOrigin.com/images1 and YourOrigin.com/images2 you can create two CDN endpoints to just serve this content that would be restricted to serving content directly from the images1 and images2 virtual directories. You can accomplish when you create a CDN endpoint by selecting custom origin as the origin type. This will allow you to specify both an origin hostname and a path as the origin URL. So if you create a CDN endpoint by specifying YourOrigin.com/images1, you would end up with an CDN endpoint URL such as az12345.vo.msecnd.net. An URL from your origin such as http://YourOrigin.com/images1/blue.jpg would translate to an URL from the CDN as just http://az12345.vo.msecnd.net/blue.jpg.

    • Marked as answer by subkamran Friday, November 6, 2015 8:15 PM
    Sunday, October 25, 2015 3:36 AM

All replies

  • Azure CDN honors cache-control: private header in addition to Cache-control: no-store, Cache-Control: no-cache, pragma: no-cache, cache-control:s-maxage, cache-control; max-age, Expires headers. If none of the above headers are included with the response from the origin server, then content is cached for a default of 7 days.

    There are a number of reasons why you may not be seeing the caching behavior you expect. To determine why you are seeing the current behavior we would need you to provide the hostname for your website that you have created the CDN endpoint for.

    Note that it is possible to create an endpoint and use it for just a specific directory. To do this you would create a new CDN endpoint and for origin type select custom origin. This will allow you to enter in the hostname and the specific path for your website. For example you could specify mywebsites.com/images. This would result in the CDN making requests to mywebsites.com/images but users would just see a CDN endpoint such as az1234.vo.msecnd.net. You would need to create multiple CDN endpoints if you needed the CDN to access content from multiple directories.

    Later this year we will be introducing functionality that will allow you to create rewrite rules for an endpoint so that you can achieve this behavior with a single endpoint

    Sunday, October 18, 2015 8:51 AM
  • Sure the origin is http://keeptrackofmygames.com and the CDN endpoint is: https://az792935.vo.msecnd.net

    The pages are all returning Cache-Control: private yet they still show up on the CDN. Is there different behavior with other cache control headers (like no-cache)?


    Kamran A

    Sunday, October 18, 2015 2:28 PM
  • From looking at the server response headers for requests to your website, the CDN is correctly honoring the Cache-Control headers that you are using. As you are currently pointing your CDN endpoint to the root of your website all content will flow through the CDN. The CDN will determine which content to cache based on cache headers that you have specified for the content. If you don't specify how specific content should be cached then content by default will be cached for 7 days on individual CDN POPs. Note that each CDN POP individually caches content. You can look at the Server and X-Cache response headers to determine how the CDN is treating your content from a caching perspective. To illustrate this lets look at response headers for two example URL's from your website one that should be cached and one that shouldn't.

    1) http://az792935.vo.msecnd.net - This is the root of your website. Following are the response headers being returned from the CDN:

    Content-Encoding: gzip
    Access-Control-Allow-Headers: Origin, X-Requested-With, Content-Type, Accept
    Access-Control-Allow-Methods: POST,GET,OPTIONS,PUT,DELETE
    Cache-Control: private
    Content-Type: text/html; charset=utf-8
    Date: Mon, 19 Oct 2015 01:10:42 GMT
    Server: Microsoft-IIS/8.0
    Set-Cookie: __RequestVerificationToken=YrrHse_71WmXxkcRnVIqauN5hU1l1ACbUSs0PHIKS-snKBT_fA0g2e4TBPp9oMTFxmNkIVsx7m7zcdYAtDwO1OQsZOQPzJD6qJffFf9zkT01; path=/; HttpOnly
    Set-Cookie: ARRAffinity=1442076bcd84c741a76ae6c7941a506a076d05128d7216aa131b142b9663e7d8;Path=/;Domain=keeptrackofmygames.com
    Vary: Origin,Accept-Encoding
    X-AspNet-Version: 4.0.30319
    X-Frame-Options: SAMEORIGIN
    X-Frame-Options: DENY
    XUA-Compatible: IE=Edge,chrome=1
    Content-Length: 8928

    Note that the Cache-Control: Private header is showing up here. The absence of the X-Cache header and the value of the Server header (i.e. Microsoft-IIS/8.0) indicates that this is being returned from your origin. So content is flowing through the CDN but this page isn't being cached on the CDN. This isn't the most optimized configuration for performance purposes as you are adding an extra network hop for delivery of this page. We are working on CDN acceleration capabilities (targeted for availability next year) which would better fit this scenario as it will provide network acceleration for non cached content such that delivery to clients via the CDN is faster in most cases versus delivery directly from your website to clients.

    2) https://az792935.vo.msecnd.net/cassette.axd/file/Content/images/logo-30x30-trans-907de7a88b606e5b5a774f087071ca8afa65c296.png. Based on the cache response headers this content should be cached by the CDN. Note that by default it take two requests to the CDN before content is cached. So the first two requests will always be default be cache-misses before one sees a cache-hit from the CDN for a specific request. Lets take a look first at the response headers returned prior to the CDN caching the content.

    Accept-Ranges: bytes
    Access-Control-Allow-Headers: Origin, X-Requested-With, Content-Type, Accept
    Access-Control-Allow-Methods: POST,GET,OPTIONS,PUT,DELETE
    Cache-Control: public, max-age=31536000
    Date: Mon, 19 Oct 2015 01:08:42 GMT
    Etag: "0183979056f0b87616cd99d5c54a48f3b771eee6"
    Expires: Tue, 18 Oct 2016 01:08:42 GMT
    Last-Modified: Sun, 02 Nov 2014 20:36:03 GMT
    Server: Microsoft-IIS/8.0
    Set-Cookie: ARRAffinity=1442076bcd84c741a76ae6c7941a506a076d05128d7216aa131b142b9663e7d8;Path=/;Domain=keeptrackofmygames.com
    Vary: Origin
    X-Frame-Options: DENY
    X-UA-Compatible: IE=Edge,chrome=1

    Note that the cache-control max-age header is showing up here with a value in seconds that indicates that content should be cached on the CDN for 365 days. The response headers aren't indicating that this content is being cached on the CDN yet as the Server header just indicates Microsoft-IIS/8.0 and their isn't a X-Cache response header. Lets now look at the response headers after the CDN is caching this content:

    Accept-Ranges: bytes
    Access-Control-Allow-Headers: Origin, X-Requested-With, Content-Type, Accept
    Access-Control-Allow-Methods: POST,GET,OPTIONS,PUT,DELETE
    Cache-Control: public, max-age=31536000
    Content-Type: image/png
    Date: Mon, 19 Oct 2015 01:09:50 GMT
    Etag: "907de7a88b606e5b5a774f087071ca8afa65c296"
    Expires: Tue, 18 Oct 2016 01:09:50 GMT
    Last-Modified: Sun, 02 Nov 2014 20:36:04 GMT
    Server: ECAcc (pae/3725)
    X-Cache: HIT
    X-Frame-Options: DENY
    X-UA-Compatible: IE=Edge,chrome=1
    Content-Length: 599

    Note that the X-Cache header is showing up with a value of HIT indicating that the content is being cached and the Server header is now showing a new value of ECAcc (pae/3725) which indicates which CDN platform (ECAcc) and which POP (pae) the cached content is being delivered from. Overall whenever cached content is being returned from the CDN it the Server header will use the following syntax: platform (POP/ID).

    Monday, October 19, 2015 2:10 AM
  • Okay, this explains what I'm seeing.

    My specific need is to prevent the CDN from even serving those pages. In order to do so, could I use URL rewrite module to look for the CDN host header and only whitelist the folders I want or will that not work? Since the CDN is passing through to my origin, that should work, right?

    For example:

    cdn.url.net/index => deny, 404
    cdn.url.net/cassette.axd/ => OK, cache

    Thanks for the detailed explanation.


    Kamran A

    Monday, October 19, 2015 5:49 PM
  • The host header you see in request to your origin would just be the host name for your origin so I don't see how you could use this to differentiate requests from the CDN versus other requests to your origin. The CDN does add a "Via: proxy" header but this is a pretty generic header. Overall, sending back a 404  for content you don't want the CDN to cache will prevent the CDN from caching specific content but is far from an ideal workflow. Why do you want to have requests for your content go to the CDN for content that you never want the CDN to serve? How would clients access this content? What is the end-to-end workflow you want to achieve here?
    Thursday, October 22, 2015 9:13 PM
  • Simply because the CDN is not my site. My site is my site. I only want the CDN for assets, not for mirroring. And I don't want to use blob storage because that workflow is sub-par--origin pull is the easiest workflow because all I do is point the CDN to my site. But I don't want Google or search engines or users to be able to browse my site through the CDN--my site is very dynamic with tons of single-page-appy stuff and makes no sense to serve over a CDN. All I need to host on a CDN is assets.

    Why wouldn't a URL rewrite rule like this work? I haven't tested it yet.

    <rule name="Redirect CDN non-asset URLs to TLD" stopProcessing="true">
        <match url="(.*)" />
        <conditions>
            <add input="{HTTP_HOST}" pattern="^az1234.ms.vo.msecnd.net$" />
            <!-- add whitelist patterns to negate -->
        </conditions>
        <action type="Redirect" url="http://keeptrackofmygames.com/{R:0}" redirectType="Permanent" appendQueryString="true" />
    </rule>


    Kamran A

    Friday, October 23, 2015 12:54 PM
  • As per last response the host header that your origin will see will just be the name of your origin. It would not be the hostname of the CDN endpoint. So you can't use the host header to filter requests that come from the CDN. If the assets you need to serve from the CDN are from one or a set of virtual directories that just have content that would be served from the CDN, you can create one or more CDN endpoints to use for this purpose. For example, if all your content for the CDN is from YourOrigin.com/images1 and YourOrigin.com/images2 you can create two CDN endpoints to just serve this content that would be restricted to serving content directly from the images1 and images2 virtual directories. You can accomplish when you create a CDN endpoint by selecting custom origin as the origin type. This will allow you to specify both an origin hostname and a path as the origin URL. So if you create a CDN endpoint by specifying YourOrigin.com/images1, you would end up with an CDN endpoint URL such as az12345.vo.msecnd.net. An URL from your origin such as http://YourOrigin.com/images1/blue.jpg would translate to an URL from the CDN as just http://az12345.vo.msecnd.net/blue.jpg.

    • Marked as answer by subkamran Friday, November 6, 2015 8:15 PM
    Sunday, October 25, 2015 3:36 AM
  • Thanks. Maybe I'll just wait until I can add some exclusion rules or something (i.e. "Exclude all assets with Cache-Control: private" so it won't even process those requests).

    Kamran A

    Friday, November 6, 2015 8:16 PM
  • Have you tried filtering by the HTTP_X_HOST header? We disabled site browsing on the CDN domain (except for a specific set of folders that serve resources) by adding the following urlRewrite rule to web.config:

    <rule name="Deny site browsing on CDN domain" stopProcessing="true">
      <match url="^(storage/|ui/)(.*)" negate="true" />
      <conditions logicalGrouping="MatchAll" trackAllCaptures="false">
        <add input="{HTTP_X_HOST}" pattern="\.azureedge.net$" />
      </conditions>
      <action type="CustomResponse" statusCode="404" statusReason="Not found" statusDescription="Not found" />
    </rule>

    Wednesday, December 9, 2015 10:07 AM
  • If this works I'll give you a virtual hug. Thanks for the tip!

    Kamran A

    Wednesday, December 9, 2015 5:49 PM