locked
Maximum File Content Size From Document Cracking RRS feed

  • Question

  • Hello,

    What is the maximum file content size stored in an Azure Search index when document cracking blob storage files?

    Thanks,

    Greg

    Tuesday, February 16, 2016 1:03 AM

Answers

  • Hi Greg,

    There are two relevant limits that depend on your service tier:

    1. Max file size that the blob indexer will attempt to process. For free tier, it's 16 MB. For S1, it's 128 MB. For S2, it's 256 MB.

    2. The max number of characters we'll extract from one blob. For free tier, it's 32*1024 characters, for S1, it's 4 million characters, and for S2, it's also 4 million characters.

    HTH!


    Thanks! Eugene Shvets Azure Search

    • Proposed as answer by Ealsur Tuesday, February 16, 2016 12:16 PM
    • Marked as answer by Greg Goodone Tuesday, February 16, 2016 3:10 PM
    Tuesday, February 16, 2016 3:14 AM
    Moderator
  • Hi Greg,
    Unit of incremental progress for blob indexer is a batch of 10 (by default) blobs. As long as at least one batch was processed during that 3 minute interval, the indexer will be making forward progress. By looking at how many batches get processed during every indexer invocation, you can estimate how long it will take for it to index your entire library. You can see your indexer execution history (which includes number of processed blobs, start and end times, and error details if any) in the portal and by using Get indexer status API.

    HTH!


    Thanks! Eugene Shvets Azure Search

    Wednesday, February 17, 2016 2:07 AM
    Moderator

All replies

  • Hi Greg,

    There are two relevant limits that depend on your service tier:

    1. Max file size that the blob indexer will attempt to process. For free tier, it's 16 MB. For S1, it's 128 MB. For S2, it's 256 MB.

    2. The max number of characters we'll extract from one blob. For free tier, it's 32*1024 characters, for S1, it's 4 million characters, and for S2, it's also 4 million characters.

    HTH!


    Thanks! Eugene Shvets Azure Search

    • Proposed as answer by Ealsur Tuesday, February 16, 2016 12:16 PM
    • Marked as answer by Greg Goodone Tuesday, February 16, 2016 3:10 PM
    Tuesday, February 16, 2016 3:14 AM
    Moderator
  • Eugene,

    As a follow up question, I've read that the maximum indexer run time is 3 minutes for the free tier.  Does this mean that documents that fell outside of the 3 minute maximum run time would then be picked up during the next indexer run or will they never be indexed?  

    For context, we have a fairly small file library (2,000 files) but some very text heavy and large files included in the library.  We don't update the library often.  I'm trying to determine which tier of Azure Search will best meet our requirements.

    Thanks,
    Greg

    Tuesday, February 16, 2016 3:10 PM
  • Hi Greg,
    Unit of incremental progress for blob indexer is a batch of 10 (by default) blobs. As long as at least one batch was processed during that 3 minute interval, the indexer will be making forward progress. By looking at how many batches get processed during every indexer invocation, you can estimate how long it will take for it to index your entire library. You can see your indexer execution history (which includes number of processed blobs, start and end times, and error details if any) in the portal and by using Get indexer status API.

    HTH!


    Thanks! Eugene Shvets Azure Search

    Wednesday, February 17, 2016 2:07 AM
    Moderator
  • Hi Eugene,

    Some clarification questions about mentioned limits.

    1) If text from blob exceeds the max number of characters what will happens: only limit will be indexed or something else?

    2) What are limitations for S3 tier?  

    Wednesday, April 20, 2016 4:48 PM