none
The item has been truncated in the index because it exceeds the maximum size

    Question

  • I'm having a problem indexing a big PDF, with more than 500.000 chars. This is the text error :

    The item has been truncated in the index because it exceeds the maximum size. ( Item truncated. Field=body, Occurrences=105354, Chars=524299; )

    Then i tried this :

    $ssa = Get-SPEnterpriseSearchServiceApplication
    $mp = Get-SPEnterpriseSearchMetadataManagedProperty -SearchApplication $ssa -Identity "body"
    $mp.MaxCharactersInPropertyStoreForRetrieval = 2097152
    $mp.Update()

    restart server, full reindexing….

    and error still persists…

    do you have any advice i can try?

    Thanks in advance

    Wednesday, February 13, 2013 10:09 AM

Answers

  • Confirmed, it's a sharepoint bug (the parameter should be changed by MaxCharactersInPropertyStoreIndex, but it is blocked to 450, it can be changed in database directly but the sharepoint will be unsupported)... initiating the escalating process to obtain a hotfix.... 

    Now the big question.... I'm the first guy in the world trying to index a document with > 500.000 ... i can't believe! :( 


    Tuesday, February 19, 2013 11:55 AM

All replies

  • How large is the PDF in question? The default maximum download size in 2013 is 40mb. You can increase this using powershell but it will effect you crawl performance. The following link still applies to 2013.

    http://blogs.technet.com/b/brent/archive/2010/07/19/sharepoint-server-2010-maxdownloadsize-and-maxgrowfactor.aspx


    Blog | SharePoint Field Notes Dev Tool | ClassMaster

    Wednesday, February 13, 2013 6:41 PM
  • Hi,

    Try changing the MaxCharactersInPropertyStoreIndex property instead. This relates to how much data may be stored while the other relates to how much data is returned for a result on query.

    Thanks,
    Mikael Svenson


    Search Enthusiast - SharePoint MVP/MCT/MCPD - If you find an answer useful, please up-vote it.
    http://techmikael.blogspot.com/
    Author of Working with FAST Search Server 2010 for SharePoint

    Wednesday, February 13, 2013 9:35 PM
  • That's not the problem... the PDF is 1.5Mb... Sharepoint 2013 can download the file, index, but cannot save more than 524299 chars on the index.
    Thursday, February 14, 2013 8:20 AM
  • tried... doesn't work... if i try 

    $ssa = Get-SPEnterpriseSearchServiceApplication

    $mp = Get-SPEnterpriseSearchMetadataManagedProperty -SearchApplication $ssa -Identity "body"
    $mp.MaxCharactersInPropertyStoreIndex = 2097152

    it says the value must be between 0 - 450

    I cant' believe i'm the only guy with this problem... no person in the entire world is trying to index files with >500.000 chars...  :(

    Thanks for the help guys!



    Thursday, February 14, 2013 8:25 AM
  • This was a known thing I've written about in 2007, might help you: http://www.loisandclark.eu/Pages/indexlargefiles.aspx

    Back then, of a document exceeded 16MB the content of the document wasn't indexed at all, but the limit could be raised by changing a registry setting:

    HKLM\Software\Microsoft\Office Server\12.0\Search\Global\GatheringManager\MaxDownloadSize

    Note: Of course, you need to update the "12.0" part in the path.


    Kind regards,
    Margriet Bruggeman

    Lois & Clark IT Services
    web site: http://www.loisandclark.eu
    blog: http://www.sharepointdragons.com

    Thursday, February 14, 2013 1:06 PM
    Moderator
  • Hi,

    This has nothing to do with the file size, but the size of the text after extracted. In FS4SP you could set how large blob the index would store, but this setting seems not to be there for 2013 Search.

    It could very well be that this is a setting for the CTS pipeline which does the extraction and inserts the data to the index. The question is how can you change it :)

    Thanks,
    Mikael Svenson


    Search Enthusiast - SharePoint MVP/MCT/MCPD - If you find an answer useful, please up-vote it.
    http://techmikael.blogspot.com/
    Author of Working with FAST Search Server 2010 for SharePoint

    Thursday, February 14, 2013 1:43 PM
  • Hi,

    Opened a support incident with microsoft... will post the results here to help others with same problem.

    Thanks,

    Felix Gardel

    Monday, February 18, 2013 10:23 AM
  • Confirmed, it's a sharepoint bug (the parameter should be changed by MaxCharactersInPropertyStoreIndex, but it is blocked to 450, it can be changed in database directly but the sharepoint will be unsupported)... initiating the escalating process to obtain a hotfix.... 

    Now the big question.... I'm the first guy in the world trying to index a document with > 500.000 ... i can't believe! :( 


    Tuesday, February 19, 2013 11:55 AM
  • Hi,

    I think most people are just happy with the truncating. 500k of pure text is quite a lot, and that's what we are talking here. The size after the text has been extracted. 500k of text should amount to around 200 pages in Word.

    I'm not saying there is not important information that far out in a document, but I guess the use case is more rare than not :)

    Thanks,
    Mikael Svenson


    Search Enthusiast - SharePoint MVP/MCT/MCPD - If you find an answer useful, please up-vote it.
    http://techmikael.blogspot.com/
    Author of Working with FAST Search Server 2010 for SharePoint

    Wednesday, March 06, 2013 1:27 PM
  • Hi,

    There are several steps in processing your file. One of them is document parsing and it allows max 2,000,000 symbols at output. After that limit no more text is extracted. At next stage the parsed content is being formatted and put into a specific model. This is the exact place where the limit of 450b truncated your text.

    This limit was recently increased to meet the document parser max output size, so the changes will take effect soon. However, you will still not be able to have more than 2mb of text

    Sincerely,

    Ievgeniia

    Friday, April 05, 2013 7:52 AM
  • Please note, that changing the MaxCharactersInPropertyStoreIndex parameter requires the April 2013 CU.  Otherwise, you'll get the error described. 

    More information on this post.


    Corey Roth - SharePoint Server MVP blog: www.dotnetmafia.com twitter: @coreyroth | SP2 Apps


    Friday, June 21, 2013 1:34 PM
    Answerer
  • I am going to try to use a different parser like adobe iFilter parser and will see if it fixes the parsing limitations within sharepoint for pdf files atleast.

    KG

    Thursday, October 13, 2016 7:35 PM