locked
How to achieve best upload performance to Blob storage? RRS feed

  • Question

  • Hi everone, 

    Im working on program that are supposed to process images by resizing them into four different width for each picture. 
    Since we have a lot of pictures, around a couple of millions, I've been playing with the Parallels / TPL API's  trying to get achieve best performance possible.

    At the moment the bottleneck is when uploading my images to the blob storage. The picture sizes are between 8, 32, 128 and around 356 KB.

    I have one out main loop (executing with Paralells) which executes some logic for each business objects that holds a number of images that should be uploaded. Within this outer loop i then have another loop (not executed with TPL) that will resize the image and then upload it to the blob storage. 

    However it takes just to much time to execute the uploading of images. Does anyone have a great idea how this can be optimized?

    Any suggestions are most welcome and appreciated!

    Regards Niclas

    public void Process()
            {
                var parallelOptions = new ParallelOptions
                {
                    //MaxDegreeOfParallelism = System.Environment.ProcessorCount,
                    MaxDegreeOfParallelism = 4,
                    
                };
                
                 // Outer loop that processes my business objects
                Parallel.ForEach(properties, parallelOptions, p =>
                {
                    // Do some business logic here.
    
                    // Process its images.
                        var keys = new List<int>(urls.Count());
    
                    // Start resizing imgage and upload them to blob storage.
                    for (var i = 0; i < urls.Count(); i++)
                    {
                        // Download from the web...
                        Image img = Download(urls[i]);
    
                        // Resize....
                        foreach (int width in ImageSizes)
                        {
                            try
                            {
                                byte[] buffer;
    
                                using (var ms = new MemoryStream())
                                {
                                    ImageBuilder.Current.Build(image.Bytes, ms, new ResizeSettings { Width = width });
                                    buffer = ms.GetBuffer();
                                }
    
                                CloudBlob blob = _client.GetBlobReference(containerName + "/" + objectKey);
                                var options = new BlobRequestOptions { Timeout = TimeSpan.FromSeconds(10) };
                                blob.UploadByteArray(buffer, options);
    
                                blob.Properties.ContentType = "image/jpeg";
                                blob.Properties.CacheControl = "public, max-age=31536000";
    
                                if (metadata != null)
                                {
                                    foreach (Tuple<string, string> tuple in metadata)
                                        blob.Metadata[tuple.Item1] = tuple.Item2;
    
                                    blob.SetMetadata();
                                }
    
                                blob.SetProperties();
    
                                // Done uploading!
    
                            }
                            catch (Exception e)
                            {
                                p.AddWarning(e, IssueTypes.INTERNAL_SERVER_ERROR, "Unhandled exception caught when resizing image object. Size: {0}, Url: {1}.", width, urls[i]);
                            }
                        }        
                    }
                };
    
            }
    

     

     

    Saturday, January 28, 2012 5:55 AM

Answers

  • Hi,

    First of all, do you really need so many loops? I am not sure where you get the urls list. But your logic is downloading all those urls for all properties. Thus if you have 5 urls and 5 properties, you're downloading 25 items. Check if this is really what you want. If the urls are common for all properties, move the logic out of the parallel for each loop. Similarly, check if you can remove other loops as well.

     

    Best Regards,

    Ming Xu.


    Please mark the replies as answers if they help or unmark if not.
    If you have any feedback about my replies, please contact msdnmg@microsoft.com.
    Microsoft One Code Framework
    • Marked as answer by Arwind - MSFT Tuesday, February 7, 2012 8:05 AM
    Monday, January 30, 2012 11:13 AM

All replies