none
gettting error on while getting the all blob name and size RRS feed

  • Question

  • i want to get the all the  blob name and size  from the specified container.we have a some million of blobs,while reading the blobs the following error is raised :

    "Unable to read data from the transport connection: The connection was closed."

    the below specified code is working fine ,if the container having less number of blobs.please tell me how to resolve this issue or which is the best efficient way to get the blob properties

     

      public static void GetBlobs(String containerName)
            {
                CloudStorageAccount cloudStorageAccount = CloudStorageAccount.Parse(
                    ConfigurationSettings.AppSettings["CAbinStorage"]);
                blobClient = cloudStorageAccount.CreateCloudBlobClient();
                CloudBlobContainer cloudBlobContainer = blobClient.GetContainerReference(containerName);                    
                lstBlobProp = new List<BlobProperties>();
                foreach (var item in cloudBlobContainer.ListBlobs())
                {
                    CloudBlockBlob cbBlob = item as CloudBlockBlob;
                    if (cbBlob != null)
                    {
                        BlobProperties objProp = new BlobProperties();
                        objProp.BlobName = cbBlob.Uri.Segments[2];
                        objProp.BlobSize = cbBlob.Properties.Length.ToString();
                       
                    }
                }
    
    
    



    Sudhesh. G
    http://gurucoders.blogspot.com
    Thursday, October 13, 2011 12:22 PM

Answers

  • Hi Sudhesh - Thanks for the question.  Here's some additional information.

    1) Can you use wireshark to find out what timeout is being sent to the server?  That error is one that you get when the request takes longer than the timeout.

    2) As returning millions of results in one response would be impractical, the blob service uses paging to allow large numbers of results to be returned over the course of multiple requests/responses.  The returned IEnumberable from ListBlobs allows paging to be handled automatically.  By default, it will get 5000 results at a time, and then fetch the next 5000 when it needs them.  So the code you wrote, using foreach, should work just fine, so long as the timeouts being sent with each request are long enough.  Do you see any results being returned at this point?  Or do you get that before you get even the first result?  For a million blobs, you'll be making at least 200 requests, so you'll see some variance in the time it takes for each page to be downloaded.

    Does that help?

    Thanks!


    -Jeff
    Friday, October 14, 2011 6:13 PM
    Moderator

All replies

  • Hi,

    Are you using cloud storage or local storage? How many blobs are there in the container? Do you get the same issue if you use paging (use the maxresults query string)?

    The maxresults parameter is used to specify the maximum number of blobs to return, including all BlobPrefix elements. If the request does not specify maxresults or specifies a value greater than 5,000, the server will return up to 5,000 items.

     

    Best Regards,

    Ming Xu.


    Please mark the replies as answers if they help or unmark if not.
    If you have any feedback about my replies, please contact msdnmg@microsoft.com.
    Microsoft One Code Framework
    Friday, October 14, 2011 6:31 AM
    Moderator
  • Hi Sudhesh - Thanks for the question.  Here's some additional information.

    1) Can you use wireshark to find out what timeout is being sent to the server?  That error is one that you get when the request takes longer than the timeout.

    2) As returning millions of results in one response would be impractical, the blob service uses paging to allow large numbers of results to be returned over the course of multiple requests/responses.  The returned IEnumberable from ListBlobs allows paging to be handled automatically.  By default, it will get 5000 results at a time, and then fetch the next 5000 when it needs them.  So the code you wrote, using foreach, should work just fine, so long as the timeouts being sent with each request are long enough.  Do you see any results being returned at this point?  Or do you get that before you get even the first result?  For a million blobs, you'll be making at least 200 requests, so you'll see some variance in the time it takes for each page to be downloaded.

    Does that help?

    Thanks!


    -Jeff
    Friday, October 14, 2011 6:13 PM
    Moderator