locked
Azure search Indexer and Maximum block size in blob container RRS feed

  • Question

  • Hi,

    I have question related to Blob Storage and Azure Search. I am using azure search service based on  the files available in the blob container. 

    According to the blob storage description, maximum block size 100MB but if we are uploading the files upto size 1GB or more, how does it manage the bigger files? does it divide and create the 100MB multiple files?

    Secondly, what is maximum file size that azure search indexer can extract? because with bigger file size we can't create the indexer. 

    Thanks in advance for your help.

    Regards,

    Gohar 

    Monday, December 31, 2018 3:56 PM

All replies

  • Hi Gohar,Regarding the first one, refer to https://docs.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs

    "Storage clients default to a 128 MB maximum single blob upload, settable using the SingleBlobUploadThresholdInBytes property of the BlobRequestOptions object. When a block blob upload is larger than the value in this property, storage clients break the file into blocks. You can set the number of threads used to upload the blocks in parallel on a per-request basis using the ParallelOperationThreadCount property of the BlobRequestOptions object. 

    When you upload a block to a blob in your storage account, it is associated with the specified block blob, but it does not become part of the blob until you commit a list of blocks that includes the new block's ID. New blocks remain in an uncommitted state until they are specifically committed or discarded. Writing a block does not update the last modified time of an existing blob. 

    Block blobs include features that help you manage large files over networks. With a block blob, you can upload multiple blocks in parallel to decrease upload time. Each block can include an MD5 hash to verify the transfer, so you can track upload progress and re-send blocks as needed. You can upload blocks in any order, and determine their sequence in the final block list commitment step. You can also upload a new block to replace an existing uncommitted block of the same block ID. You have one week to commit blocks to a blob before they are discarded. All uncommitted blocks are also discarded when a block list commitment operation occurs but does not include them. "

    As far as the second one goes, the limits are listed at https://docs.microsoft.com/en-us/azure/search/search-limits-quotas-capacity

    hth
    Marcin

    Monday, December 31, 2018 4:09 PM
  • Hi Sumanth,

    Thank you so much for sharing this information. I still have more query regarding my 2nd question.

    I am   using S2 service tier and it has indexer limit with  maximum blob size of 256MB.  Is one indexer can have  multiple 256MB blobs of same type or indexer can only extract the data from single blob (I mean if upload the multiple 256MB files in the container, does one indexer work with all of them or is it limited to one file)?

      Thanks.

    Regards,

    Gohar

    Wednesday, January 2, 2019 8:59 AM
  • A single blob can be no more than 256MB in original size.There is nothing related to the size of the text that is extracted.

    Wednesday, January 2, 2019 4:20 PM