none
How often should I have a full crawl running?

    Question

  • Hi,

    Still getting used to SP. How often should I have my Full Crawl running? At the moment, by whomever set it up, it's running daily at 3am but it's taking up to 12 hours!!! It's slowing down my system as well.

    The incremental crawl runs every hour and only takes a few minutes. What are best practices for this in your opinions? Thanks

    SJ

    Friday, January 27, 2012 2:36 PM

Answers

  • Yes it really depends on the data increments as suggested by Marek.

    I would suggest a incremental crawl to be scheduled on an hourly basis and Full incremental crawl on a weekly basis.

    Also have a separate machine/VM for performing the crawl [Index Server]. If the amount of data is huge in Millons then go for a partitioned index servers.

     


    Amalaraja Fernando,
    SharePoint Architect
    This post is provided "AS IS" with no warrenties and confers no rights.
    • Edited by Amalaraja Fernando Friday, January 27, 2012 3:30 PM
    • Marked as answer by SoapyJ Friday, January 27, 2012 3:44 PM
    Friday, January 27, 2012 3:28 PM
  • Indeed, it depends on how your infrastructure is set up, as Marek said.  One thing to keep in mind (again, depending on your overall infrastructure and farm configuration), is to look at the crawler impact rules and how to manage them to best suit your needs.

    http://technet.microsoft.com/en-us/library/cc262926.aspx#section5

    I'm not sure I would use a full crawl running at 3AM daily...especially not if it's taking 12 hour (which means it runs for pretty much 100% of your local business day, as it would end around 5PM...yikes!).  I'm not sure if you are geographically spread out (your user base) or just locally confined (say, in the same city), but I typically find that there is no need for me to run Full Crawl except for on the weekends*

    *Exemption to this rule is holidays, or days when the servers are down for scheduled monthly maintenance (updates, patches, service packs, etc - install updates, ensure everything works, and then initiate a full crawl manually through Powershell or Central Administration).

    In my situation, with multiple farms that are all of a "medium" size (Web Front End, Central Admin Application Server, SQL server), I run incremental crawls every 2 hours.  These are staggered across various content sources:

    Example:

    Monday-Friday

    (This is just an example: I don't run that many incrementals during these hours, as 90% of our users are in our time zone, so there isn't much to update as external users are readers, and not contributors and above).

    Content Source 1 starts at 1AM, Incremental.

    Content Source 2 starts at 1:30AM, Incremental.

    Content Source 3 starts at 2AM, Incremental.

    Content Source 4 starts at 2:30AM, Incremental.

    Content Source 1 Starts at 3AM, Incremental.

    Friday Evening, Saturday, Sunday

    Content Source 1 starts at 12AM (Friday/Saturday/Sunday at Midnight), Full Crawl.

    Content Source 2 starts at 6AM Saturday/Sunday, Full Crawl.

    Content Source 3 starts at 12PM Saturday/Sunday, Full Crawl.

    Content Source 4 starts at 6PM Saturday/Sunday, Full Crawl.

    And then at 6PM on Sunday, a full farm backup completes (we use other backup solutions to provide snapshots every 3 hours of the servers, should we need to fall back on those, but I still keep farm backups around for 90 days).

    I hope this helps - I haven't had any issues with it as of yet, in my configuration.

     


    Friday, January 27, 2012 3:31 PM

All replies

  • Depending on your infrastructure and your data increments. You can try to optimize your search topology to run faster if you can http://blogs.msdn.com/b/joelo/archive/2007/12/05/10-things-to-optimize-your-sharepoint-server-indexing.aspx


    Marek Chmel, WBI Systems (MCTS, MCITP, MCT, CCNA)
    Please Mark As Answer if my post solves your problem or Vote As Helpful if a post has been helpful for you.
    Friday, January 27, 2012 2:43 PM
  • Yes it really depends on the data increments as suggested by Marek.

    I would suggest a incremental crawl to be scheduled on an hourly basis and Full incremental crawl on a weekly basis.

    Also have a separate machine/VM for performing the crawl [Index Server]. If the amount of data is huge in Millons then go for a partitioned index servers.

     


    Amalaraja Fernando,
    SharePoint Architect
    This post is provided "AS IS" with no warrenties and confers no rights.
    • Edited by Amalaraja Fernando Friday, January 27, 2012 3:30 PM
    • Marked as answer by SoapyJ Friday, January 27, 2012 3:44 PM
    Friday, January 27, 2012 3:28 PM
  • Indeed, it depends on how your infrastructure is set up, as Marek said.  One thing to keep in mind (again, depending on your overall infrastructure and farm configuration), is to look at the crawler impact rules and how to manage them to best suit your needs.

    http://technet.microsoft.com/en-us/library/cc262926.aspx#section5

    I'm not sure I would use a full crawl running at 3AM daily...especially not if it's taking 12 hour (which means it runs for pretty much 100% of your local business day, as it would end around 5PM...yikes!).  I'm not sure if you are geographically spread out (your user base) or just locally confined (say, in the same city), but I typically find that there is no need for me to run Full Crawl except for on the weekends*

    *Exemption to this rule is holidays, or days when the servers are down for scheduled monthly maintenance (updates, patches, service packs, etc - install updates, ensure everything works, and then initiate a full crawl manually through Powershell or Central Administration).

    In my situation, with multiple farms that are all of a "medium" size (Web Front End, Central Admin Application Server, SQL server), I run incremental crawls every 2 hours.  These are staggered across various content sources:

    Example:

    Monday-Friday

    (This is just an example: I don't run that many incrementals during these hours, as 90% of our users are in our time zone, so there isn't much to update as external users are readers, and not contributors and above).

    Content Source 1 starts at 1AM, Incremental.

    Content Source 2 starts at 1:30AM, Incremental.

    Content Source 3 starts at 2AM, Incremental.

    Content Source 4 starts at 2:30AM, Incremental.

    Content Source 1 Starts at 3AM, Incremental.

    Friday Evening, Saturday, Sunday

    Content Source 1 starts at 12AM (Friday/Saturday/Sunday at Midnight), Full Crawl.

    Content Source 2 starts at 6AM Saturday/Sunday, Full Crawl.

    Content Source 3 starts at 12PM Saturday/Sunday, Full Crawl.

    Content Source 4 starts at 6PM Saturday/Sunday, Full Crawl.

    And then at 6PM on Sunday, a full farm backup completes (we use other backup solutions to provide snapshots every 3 hours of the servers, should we need to fall back on those, but I still keep farm backups around for 90 days).

    I hope this helps - I haven't had any issues with it as of yet, in my configuration.

     


    Friday, January 27, 2012 3:31 PM
  • Thanks everyone. ThatSharePoint Guy, that sounds like a good plan to me. For starters, I will schedule the full crawl weekly. I do not forsee it being an issue having it every week. The fact that it takes 12 hours to complete during business hours is just causing slow issues on the system. Thank you so much
    Friday, January 27, 2012 3:40 PM
  • You're welcome, SoapyJ!  After you get it set up, don't forget to get back in there and verify that it's working.  Check out any and all errors that get returned during the crawls, and then sort those issues out - you'll be amazed at how nice everything works after a little healthy tweaking.

    The best of luck to you!

    Monday, January 30, 2012 3:31 PM