Search keeps trying to crawl the internet, when the server has no internet access

Unanswered Search keeps trying to crawl the internet, when the server has no internet access

  • Wednesday, April 18, 2012 9:21 PM
     
     

    We, like many of you, have a sharepoint server that has NO internet access.  We have people who link to things like howtogeek.com or cnn.com, and when SharePoint crawls our site, it's trying to crawl those sites despite the fact that in the content source is set "<label for="ctl00_PlaceHolderMain_spSettingsCrawlSiteRadioButton">Crawl only the SharePoint Site of each start address".  Then I've created a content source where it's a "web" not a "sharepoint" and when I set it up with "</label><label for="ctl00_PlaceHolderMain_webSettingsCrawlSiteRadioButton">Only crawl within the server of each start address".  Also I've configured it to have "custom" with "0" server hops.  Still it goes it out to the internet.  Why, I've ran a full crawl to find out where the links are (setting verbose logging), it never indicates a lot of the sites that are showing up in our firewall log.  However I know that it's search that does it as it doesn't happen until I run a full crawl.</label>

    I've disabled all  federated locations, etc.  just can't figure it out.

    Bryan

All Replies

  • Wednesday, April 25, 2012 7:41 AM
    Moderator
     
     
    Hi Bryan,

    Thank you for your question.
    I am trying to involve someone familiar with this topic to further look at this issue.

    Thanks,
    Lhan Han
  • Wednesday, April 25, 2012 2:38 PM
     
     
    Appreciate it.  Would love to get to the bottom of this problem.
  • Friday, April 27, 2012 3:19 PM
     
     

    Hello Bryan,

    Do you have a proxy server setup?

    Thanks!

    Regards,

    Shruti

  • Friday, April 27, 2012 3:31 PM
     
     

    No, we have a firewall, but none of our servers are setup to be allowed through it.  There is a client that is installed on the desktops, but as our normal procedures servers are not even given the client.

    Bryan

  • Tuesday, May 08, 2012 2:51 PM
     
     

    Hello Bryan,

    A good way to check.

    Look through the IIS logs and see if the content access account is accessing sites over the internet.

    Thanks!

    Regards,

    Shruti

    • Marked As Answer by Shruti-MSFT Monday, May 14, 2012 3:04 PM
    • Unmarked As Answer by Bryan - COE Monday, May 14, 2012 4:12 PM
    •  
  • Monday, May 14, 2012 4:13 PM
     
     
    ZERO entries, I thought I updated this post, but must not have.  There are absolutely zero entries going to any of the sites listed in my firewall logs.  BUT there are no entries unless search is in the process of crawling.
    • Marked As Answer by Shruti-MSFT Monday, May 14, 2012 9:43 PM
    • Unmarked As Answer by Shruti-MSFT Monday, May 14, 2012 9:43 PM
    •  
  • Monday, May 14, 2012 9:54 PM
     
     

    Hello Brayn,

     Strange that there are no entries in the IIS logs for the content access account. It looks like a network issue. An analysis of the network trace would be helpful here. I would suggest opening up a ticket with MS support for further research into the issue.

    Thanks!

    Regards,

    Shruti

  • Tuesday, May 15, 2012 1:56 PM
     
     

    Hello Bryan,

    I posted a reply too but it looks like it did not get saved. Strange that the IIS logs do not have any entries for the content acces account accessing the sites over internet. I would suggest collecting a network trace and analysing it.

    Regards,

    Shruti