locked
What are the scenarios where full crawling will required. RRS feed

  • Question

  • Hi All,

    In our environment full crawling takes around 48 hours so it is not possible that full crawling can be done frequently. I am wondering do we require full crawling in below scenarios-

    1. Add a new content source
    2. Removing a new content source
    3. Updating the crawling rule.
    4. Add the couple of new urls in existing content source.

    I am wondering in what exact scenario full crawling should be used.

    Regards Amit


    Amit - Our life is short, so help others to grow.....

    Whenever you see a reply and if you think is helpful, click ♥Vote As Helpful♥ And whenever you see a reply being an answer to the question of the thread, click ♥Mark As Answer♥

    Sunday, October 27, 2013 4:24 AM

Answers

  • Hi Amit,

    in all of these scenarios you really need do full crawls:

    Reasons for a Search service application administrator to do a full crawl for one or more content sources include the following:

    • A Search service application has just been created and the preconfigured content source Local SharePoint sites have not been crawled yet.

    • Some other content source is new and has not been crawled yet.

    • A software update or service pack was installed on servers in the farm. See the instructions for the software update or service pack for more information.

    • A Search service application administrator, site collection administrator or tenant administrator added a new managed property or changed an existing managed property. A full crawl is required for the new or changed managed property to take effect.

    • You want to detect security changes that were made to local groups on a file share after the last full crawl of the file share.

    • You want to resolve consecutive incremental crawl failures. If an incremental crawl fails a large number of consecutive times at any level in a repository, the system removes the affected content from the search index.

    • Crawl rules have been added, deleted, or modified.

    • You want to repair a corrupted search index.

    • The credentials for the user account that is assigned to the default content access account have changed. A full crawl is required only if the permissions of this user account have changed.

    The system does a full crawl even when an incremental crawl or continuous crawl is scheduled under the following circumstances:

    • A search administrator stopped the previous crawl.

    • A content database was restored, or a farm administrator has detached and reattached a content database.

    • A full crawl of the content source has never been done from this Search service application.

    • The crawl database does not contain entries for the addresses that are being crawled. Without entries in the crawl database for the items being crawled, incremental crawls cannot occur.

    See http://technet.microsoft.com/en-us/library/jj219577.aspx#Plan_full_crawl for SP2013.

    As I can see your full crawl performance, you should try to resolve that (if there is really need for doing full crawl often). this can be done by adding dedicated web frontends for crawling, applying of request management, increasing of performance of particular servers... The reason why to solve it is that even incremental crawl can force full crawl in case of problems.  

    So for your list, I think you must do full crawls in these situations:

    • Add a new content source (only this new content source)
    • Updating the crawling rule (only content source, which rules applies for)

    But from my experience, I always try to plan full crawls once a week. It makes index more clear. Of course, it is very important to monitor that and to check crawl logs, if all is ok.

    Rado


    Radoslav Sopon

    • Proposed as answer by Mikael SvensonMVP Friday, November 1, 2013 8:31 PM
    • Marked as answer by JasonGuo Saturday, November 2, 2013 7:19 AM
    Sunday, October 27, 2013 12:55 PM

All replies

  • Hi Amit,

    in all of these scenarios you really need do full crawls:

    Reasons for a Search service application administrator to do a full crawl for one or more content sources include the following:

    • A Search service application has just been created and the preconfigured content source Local SharePoint sites have not been crawled yet.

    • Some other content source is new and has not been crawled yet.

    • A software update or service pack was installed on servers in the farm. See the instructions for the software update or service pack for more information.

    • A Search service application administrator, site collection administrator or tenant administrator added a new managed property or changed an existing managed property. A full crawl is required for the new or changed managed property to take effect.

    • You want to detect security changes that were made to local groups on a file share after the last full crawl of the file share.

    • You want to resolve consecutive incremental crawl failures. If an incremental crawl fails a large number of consecutive times at any level in a repository, the system removes the affected content from the search index.

    • Crawl rules have been added, deleted, or modified.

    • You want to repair a corrupted search index.

    • The credentials for the user account that is assigned to the default content access account have changed. A full crawl is required only if the permissions of this user account have changed.

    The system does a full crawl even when an incremental crawl or continuous crawl is scheduled under the following circumstances:

    • A search administrator stopped the previous crawl.

    • A content database was restored, or a farm administrator has detached and reattached a content database.

    • A full crawl of the content source has never been done from this Search service application.

    • The crawl database does not contain entries for the addresses that are being crawled. Without entries in the crawl database for the items being crawled, incremental crawls cannot occur.

    See http://technet.microsoft.com/en-us/library/jj219577.aspx#Plan_full_crawl for SP2013.

    As I can see your full crawl performance, you should try to resolve that (if there is really need for doing full crawl often). this can be done by adding dedicated web frontends for crawling, applying of request management, increasing of performance of particular servers... The reason why to solve it is that even incremental crawl can force full crawl in case of problems.  

    So for your list, I think you must do full crawls in these situations:

    • Add a new content source (only this new content source)
    • Updating the crawling rule (only content source, which rules applies for)

    But from my experience, I always try to plan full crawls once a week. It makes index more clear. Of course, it is very important to monitor that and to check crawl logs, if all is ok.

    Rado


    Radoslav Sopon

    • Proposed as answer by Mikael SvensonMVP Friday, November 1, 2013 8:31 PM
    • Marked as answer by JasonGuo Saturday, November 2, 2013 7:19 AM
    Sunday, October 27, 2013 12:55 PM
  • Hi Radoslav,

    Regarding your point to improve the performance of full crawl, farm has 12 TB of documents so if full crawling taken approx 48 hours we should not be surprise. I suppose it wouldn't be possible for us to run the full crawl every week.


    Amit - Our life is short, so help others to grow.....

    Whenever you see a reply and if you think is helpful, click ♥Vote As Helpful♥ And whenever you see a reply being an answer to the question of the thread, click ♥Mark As Answer♥

    Sunday, November 3, 2013 1:54 AM
  • Hi Amit,

    I fully understand. It is good practise, but dont have to fit in your situation. Btw. even 12 TB of content crawl can be speeded up by proper sizing. But it cost money to invest in more servers and licenses. Maybe is sufficient in your situation to do full crawls in situations stated plus each maintenance window.


    Radoslav Sopon

    Sunday, November 3, 2013 8:48 PM
  • Thank you amit. Very useful information.

    Raga Kandimalla

    Friday, March 7, 2014 3:02 PM
  • Hi Raga, Intead of writing thanks you can mark the reply as helpful.

    Regards Restless Spirit

    Tuesday, March 11, 2014 6:10 AM
  • helpful

    Raga Kandimalla

    Wednesday, July 23, 2014 4:40 PM
  • Hello Radoslav

    Nice information. 

    I have one more query . If we change the URL for the existing Content source and change the settings in the Alternate Access Mapping (AAM).

    Old : http://prepord.sharepoint.com

    New : http://sharepoint.com 

    Do we still need to do a full crawl (or) Is there any other way where we can use the existing crawled items instead of crawling everything again.

    Thanks in advance.

    Regards,

    Harish  

    Thursday, January 28, 2016 10:39 AM