locked
SharePoint crawl with IIS Rewrite rules RRS feed

  • Question

  • All,

    I am running into an issue when trying to crawl our internet facing SharePoint site which makes use of the IIS URL Rewrite 2.0 module. If I try to crawl the site I get a warning and an error in the crawl log. The warning says the "The URL was permanently moved. (URL redirected to ...)". The error says "Access is denied. Verify that either the Default Content Access Account has access to this repository, or add a crawl rule to this repository".

    Anonymous access is enabled on the site. Despite this, I made the default content access account a reader on this site. I also added an crawl rule to include everything in the site. The error I get now is "The filtering process has been terminated".

    If I completely remove the rewrite rules, the site can be crawled. However, this is not an option going forward.

    Any ideas how I can get around this?

    Thanks

    Wednesday, December 5, 2012 3:11 PM

Answers

  • Well, to my knowledge URL rewriting is not supported by Microsoft so there's not much info out there. There is this previous thread looks like it details the configuration needed to get URL Rewrite and SharePoint to play nice. I would suggest starting there and seeing if the crawl still has issues.


    Jason Warren
    Infrastructure Architect

    Hi Jason, it is supported: http://www.iis.net/downloads/microsoft/url-rewrite however it does have some funkyness with search and quite a few caveats.

    Here's a guide on how to get it working and it explains some of the backend weirdness: http://myspworld.wordpress.com/2012/10/30/url-rewriting-part-3-integrating-with-sharepoint-2010/


    My CodePlex - My Blog - My Twitter

    • Marked as answer by Qiao Wei Tuesday, December 18, 2012 1:56 AM
    Saturday, December 8, 2012 9:47 AM

All replies

  • It's generally not a good idea to rewrite SharePoint URLs. May I ask what problem you are trying to solve with URL Rewrite?

    Jason Warren
    Infrastructure Architect


    • Edited by Jason Warren Wednesday, December 5, 2012 9:01 PM
    Wednesday, December 5, 2012 9:01 PM
  • Sure Jason, thank you for asking.

    We work together... and our goal was to maintain the exact URLs we had in our old CMS system prior to the migration into SP 2010. And to avoid Google and etc. to have to re-index the website; a big risk for us in terms of SEO and revenue that comes from Organic Searches.

    I guess the question here is not why we used "IIS URL Rewrite 2.0", but why does the crawler stop or is not able to crawl a rewritten website...

    In my mind the SP Search Crawler should act as a spider, and simply find all pages, no matter what sources. Is this not the case?

    Thanks in advance!


    Life is Good

    Wednesday, December 5, 2012 9:31 PM
  • Well, to my knowledge URL rewriting is not supported by Microsoft so there's not much info out there. There is this previous thread looks like it details the configuration needed to get URL Rewrite and SharePoint to play nice. I would suggest starting there and seeing if the crawl still has issues.


    Jason Warren
    Infrastructure Architect

    Wednesday, December 5, 2012 9:36 PM
  • Well, to my knowledge URL rewriting is not supported by Microsoft so there's not much info out there. There is this previous thread looks like it details the configuration needed to get URL Rewrite and SharePoint to play nice. I would suggest starting there and seeing if the crawl still has issues.


    Jason Warren
    Infrastructure Architect

    Hi Jason, it is supported: http://www.iis.net/downloads/microsoft/url-rewrite however it does have some funkyness with search and quite a few caveats.

    Here's a guide on how to get it working and it explains some of the backend weirdness: http://myspworld.wordpress.com/2012/10/30/url-rewriting-part-3-integrating-with-sharepoint-2010/


    My CodePlex - My Blog - My Twitter

    • Marked as answer by Qiao Wei Tuesday, December 18, 2012 1:56 AM
    Saturday, December 8, 2012 9:47 AM
  • Hi Maarten.

    I reviewed the 2 links you posted above, but I do not see anything relevant to search, especially crawling. Can you please be more specific?

    Thanks

    Tuesday, December 18, 2012 9:19 PM
  • Once the steps for sharepoint are complete you should be able to do a search crawl and it will return friendly urls.

    My CodePlex - My Blog - My Twitter

    Tuesday, December 18, 2012 11:31 PM
  • This may help someone.

    We did the URL rewrite from http to https but the crawler was failing when the rewrite rule was on.  Once we turned off the rewrite rule, it worked.

    What we needed to do was to add an exclusion condition to prevent the rewrite from http to https IF the USER_AGENT matched the pattern = "MS Search".  Basically this is the USER_AGENT that is doing the crawling.  

    The below rule from the web.config states that redirect from http to https UNLESS the USER_AGENT string contains MS Search.  

    The key line is:
    <add input="{HTTP_USER_AGENT}" pattern="MS Search" negate="true" />

    Once we added this, everything worked.

    <rewrite>
          <rules>
            <rule name="HTTP to HTTPS redirect" enabled="true" stopProcessing="true">
              <match url="(.*)" />
              <conditions>
                            <add input="{HTTPS}" pattern="^OFF$" />
                            <add input="{HTTP_USER_AGENT}" pattern="MS Search" negate="true" />
              </conditions>
              <action type="Redirect" url="https://{HTTP_HOST}/{R:1}" redirectType="Found" />
            </rule>
          </rules>
    </rewrite>    
    • Proposed as answer by Aaron Wiggans Friday, February 26, 2016 8:56 PM
    Friday, February 26, 2016 8:56 PM
  • Aaron! Thank you so much for posting this, it was exactly what we were looking for.

    Tuesday, November 22, 2016 1:02 AM