Access denied when crawling public internet site
-
Friday, May 21, 2010 8:05 PM
I have a dev and a qa SharePoint 2010 server farm environment. In my dev environment I have a search content source setup to crawl a series of public facing (non-sharepoint) web site. The crawling works fine, no issues.
I recreated the exact same content source in my QA environment and when the crawl runs I get "Access Denied" errors for each of the web sites listed in the content sources. I verified that I can access the sites by using a browser on the QA crawler server.
To try to debug the issue I loaded up fiddler on both the dev and qa environments. Using a proxy configuration change I was able to watch the search crawler on the dev farm successfully crawl the public facing web sites. With the same configuration on QA I saw that each web site returned a 401 access denied error.
I looked into the http header of the request on dev and qa to see if there was any difference. For some reason on the QA server the GET request being sent by the crawler includes an authorization header. The dev request does not.
I have checked the content sources and crawl rules on both dev and qa; both servers are exactly the same. Any ideas on why the crawler would be sending an invalid authentication header to a public facing website even when the site doesn't request any authentication information? I thought an authorization header was only sent after the web server demanded it and had provided the types (ntlm, basic, etc...) that it supported back to the requestor.
All Replies
-
Friday, May 28, 2010 5:18 PM
I just built another single server install of SharePoint 2010 on Windows 7 and am seeing the exact same issue. When the server crawls public sites it is trying to pass authentication credentials which causes the remote web site to return a 401 access denied error.
Anyone else seeing this issue?
-
Tuesday, June 01, 2010 1:29 PM
I have once againverified that my orignal dev environment does not send the HTTP header "authorization" in the initial get request for crawled content. Only my QA and recently built single server install does.
What would cause SharePoint to automatically send the authorization header to a website? Isn't the protocol supposed to be:
- Attempt anonymous connection
- The server can deny the request and require authentication. The server returns a header indicating the authentication scheme it supports
- The connection attempt is made again, this time with a compatible authentication scheme (authorization header in http request)
- If the authentication is successful the requested content is returned, otherwise the server returns an access denied message.
I must have something configured differently between farms... I have yet to see what that would be. Maybe a bug in SharePoint?
-
Wednesday, June 02, 2010 6:50 PM
Another update...
The problem only occurs on SharePoint 2010 farms or standalone installations running on Windows Server 2008 R2 or Windows 7. I have a Windows Server 2008 farm and also a Windows Server 2008 standalone installation that work correctly.
I have enlisted the help of Microsoft tech support to track down this issue. I would love to hear reports from anyone else that is running a farm on Windows Server 2008 R2 or a standalone install on Windows 7.
-
Friday, June 04, 2010 12:19 AMSpoke with Microsoft product support and they were able to reproduce this issue. Once I get a resolution I will post it here to help others.
-
Monday, June 07, 2010 11:01 PMUpdate on this saga. It appears that I am the first person who has reported this issue to Microsoft support. They have confirmed there is a problem, however, they are not sure the cause at this time. I have been asked today to hold tight for a few more days to give them a chance to dig deep into this issue.
If you get your question answered, please come back and mark the reply as an answer.
If you are helped by an answer to someone else's question, please mark it as helpful.
Mike Hacker | Blog: http://mphacker.spaces.live.com -
Wednesday, July 14, 2010 5:50 PM
I'm also having the same issue, we have a Farm Install in our dev environment that can crawl anonymous sites with no issue. However, using the same settings and content sources in my dev environment produces the same issues mentioned above. I tried different accounts, proxies, authentication... No go...
Working Server Version - Windows Server 2008 (no R2). SP 2010 Farm Install with 1 web / sql
Non-Working Server Version - Windows 7. Standalone SP 2010 Install
The internal sites and file shares crawl without issue, but like Mike says, fiddler is producing access denied issues while crawling the anonymous sites.
You're not alone!
-
Wednesday, July 14, 2010 7:46 PMHave they finished digging deeper yet? Are you still sitting tight? Any resolution? Thanks for the tip(s), if any!
-
Monday, July 19, 2010 3:41 PM
Hi Mike,
I'm having the same problem. So far I've been able to isolate that (I think) this only happens if the web server is running IIS. I've been able to crawl a few non-Microsoft sites successfully, but this seems to be a problem for WordPress hosted on IIS as well as SharePoint 2010 sites. A SharePoint 2010 example is http://sharepoint.microsoft.com. A WordPress on IIS example is http://tristanwatkins.com. I'd be interested if you get a resolution to this, as some of the sites we want to crawl are not sites that we administer.
Cheers,
Tristan
-
Friday, July 23, 2010 10:34 AM
Hi all,
I have a site on my dev environment with anonymous access enabled which crawled properly. But same site when we hosted on internet and configured alternate access mapping, the crawl is not happening and says access denied. Can anyone help ?
-
Thursday, January 27, 2011 10:35 AMWhat's about this issue? We've got a SP 2010 Farm Install with 1 web / sql and Windows Server 2008 R2. I'm just trying to crawl public web sites. Sometimes this works but sometimes I'm getting the "Access is denied" error. Has Microsoft found a reason for this? Is this a bug?? Was anybody able to solve this?
-
Wednesday, March 09, 2011 1:23 AMany updates on this issue?
-
Tuesday, October 18, 2011 1:40 PM
Q. for mike hacker, did microsoft ever confirm they have fixed this, and in what cumulative update etc.
Hitting the same problem, on a windows 2008 R2 SP2010 box
SP is version 14.0.6109.5002 and i still get the access denied issues on the start addresses in this content source
I even placed a robots.txt in the root of the site i am crawling, to allow crawling (the site im crawling is a non sharepoint site, with no security, it allows anonymous access). This site is also not external to the company its a site we own running on apache.
What happens is I stop and restart the search service in SP2010, crawl the site, and it gets half way through and bombs out, i repeat and then i just get the Access denied errors no attempt to crawl. I then have to stop and restart the search server and repeat but the same thing happens, it gets half way through and bombs out, and subsequent attempts just bounce back.
No proxy or firewall issues, dont get this issue on my dev servers (but these are not 2008R2).
Brad
-
Tuesday, May 01, 2012 6:08 AM
-
Wednesday, December 12, 2012 7:43 PM
I had this same issue while setting up search for a site but I had already configured 2 other public sites that were operating as expected (so much for taking good notes). But I believe that I have stumbled upon the key to this.
A little background:
- A Normal sharepoint application with windows authentication (used for authoring and content updates)
- An extended that web application was created (using the public URL) to the Extranet Zone (zone only for ref, any other should be fine)
- The extneded zone is configured for anonymous access for viewers
- For the extended Zone (Extranet) Ensure the Following in Central Admin -> Application Mgmt -> Authentication Providers
- Enable Anonymous is CHECKED
- Integrated Windows authentication is UNCHECKED
- Basic authentication is UNCHECKED
- Enable client integration is UNCHECKED
I believe that the above settings were somehow confusing the crawler and causing it to try and authenticate even though it shouldn't have been. The crawler itself is left as as Sharepoint Site (I created a second content source for this application) which provides some benefits over a plain Web Site type as I understand it.
This assumes that you have correctly configured anonymous authentication on you public facing application and that you can hit is from a browser both on (and off) the server. If not, take a look at this guide, it has pretty good screen shots and explinations (http://www.topsharepoint.com/enable-anonymous-access-in-sharepoint-2010)
- Proposed As Answer by Peter Van Tilburg Wednesday, December 12, 2012 7:43 PM
- Edited by Peter Van Tilburg Wednesday, December 12, 2012 8:03 PM
- Unproposed As Answer by Hemendra AgrawalMicrosoft Community Contributor, Moderator Tuesday, March 12, 2013 12:25 PM
-
Wednesday, February 06, 2013 12:06 AMI had the same problem. One tip that helped me was to make sure that in the content source setting for the search application service, you have listed the name of the public facing site along with the internal name. I changed saved and ran a full crawl and results showed up successfully.

