Hi, just to close this thread, there are a number of factors that may have played a part in this, and we wanted to post them here.
VM size: http://blogs.msdn.com/b/avkashchauhan/archive/2011/01/30/windows-azure-network-io-capability-for-each-vm-size.aspx
In the case of an xl vm you have the entire machine’s NIC available to you, in the case of a small medium etc, you are collocated with other vms and must share the nic. Meaning that using one small instance has a maximum theoretical throughput of
12.5 MB / s. If the workers are homogeneous (i.e. performing the same work load) we would recommend using a xl vm with multiple threads for concurrency. This could also help solve the issue of hitting the maximum throughput per blob partition by having
the blob downloaded 8x less times.
Download Method:
Blob read stream begins by fetching a list of blocks and issuing subsequent downloads correlating to each block size (4mb by default). This has some overhead as you must issue the getBlock list and the turnaround for multiple requests
subsequently. Also, whenever possible use the concrete type method GetBlockBlobReference to prevent added latencies from internal calls to fetchAttributes to determine the blob type.
Co-locate your data:
When a compute instance is co-located with the storage account (in the same data center, e.g. North Central US) the transactions happen over the internal Network which is both free (bandwidth – you still pay for transactions) and much higher
performance. We always recommend making sure the compute and storage are in the same location.
Stagger Requests to a single partition:
A Blob partition is its canonical name([accountname][containername][blobname]). So issuing several concurrent requests to the exact same blob may exceed the partition target of 60 MB/s. If you need to scale faster (more concurrent reads)
consider temporarily issuing a few copyblob operations and duplicating the blob during startup and then cleaning it up later.
-Jeff