Azure PageBlob low performance - fragmentation issue
-
14 สิงหาคม 2555 20:27
I am using Azure PageBlobs to store 32 bit integers for a collection of items in the following manner:
- I have a single container called Index.
- In this container I create a PageBlob 512 bytes in size, for each item. I have a lot of items - millions.
- I don't know how many integers each item will be associated with, so when I find an integer that I can associate with the item, I download the last page in the page blob, find an empty spot and upload the updated page blob back. If I don't find an empty spot, I resize the page blob by adding a single new page to it, thusly (code idea found on storage team's blog):
var pageBlob = (...) var requestUri = pageBlob.Uri; if (StorageAccount.Credentials.NeedsTransformUri) { requestUri = new Uri(StorageAccount.Credentials.TransformUri(requestUri.ToString())); } var request = BlobRequest.SetProperties(requestUri, TimeOutInSeconds, pageBlob.Properties, null, BlobPageSize * pages); request.Timeout = TimeOutInSeconds * 1000; StorageAccount.Credentials.SignRequest(request); using (var response = request.GetResponse()) { }Unfortunately when I try to download the page blob with about 1000+ pages in it, I get really bad performance. If I do it from my local machine (100mbit downlink), a 500 KB blob takes up to 13 seconds to download. I've provisioned a VM in the same data center as my storage account, and performance in the cloud is somewhat better - on average 3.5 seconds to my 13. Both systems download a randomly filled (nonzero) 4MB page blob in 1 second or less, if I allocate all the pages at once.
I am able to easily reproduce this behavior.
How should I mitigate this issue? Add pages in powers of two? Defragment each n inserts? Is there any documentation about this issue?
ตอบทั้งหมด
-
15 สิงหาคม 2555 0:11
Could you write a simple program to reproduce the issue? I.e., given a storage account and key, create a page blob that starts at 4MB, create one that starts at 512B and grows 512B at a time until it's 4MB, and then download them both (and time it).
Note that with page blobs, you're only charged for the pages you're actually using (the "valid" pages), so there's no harm in creating a 1TB page blob to start with and using Get Page Ranges to keep track of which pages are actually being used. That said, without understanding what the performance issue is, this may not help. (I wouldn't expect it to help, but then again, I wouldn't expect the performance difference you're seeing.)
-
15 สิงหาคม 2555 2:00
Here is sample code. It'll take quite a while to create the fragmented blob, so have some patience. Once the fragmented blob is created, you'll note that all read operations on it are quite much slower.
using System; using System.IO; using Microsoft.WindowsAzure; using Microsoft.WindowsAzure.StorageClient; using Microsoft.WindowsAzure.StorageClient.Protocol; namespace BlobSpeedTest { class Program { private const string AccountName = "AccountName", AccountKey = "AccountKey", ContainerName = "test", SequentialLayoutBlobName = "sequential", FragmentedBlobName = "fragmented"; private const int NumberOfPagesToTest = 1000, PageByteCount = 512; private static readonly Random Random = new Random(); private static CloudStorageAccount Account { get { return new CloudStorageAccount( new StorageCredentialsAccountAndKey(AccountName, AccountKey), new Uri(string.Format("http://{0}.blob.core.windows.net", AccountName)), new Uri(string.Format("http://{0}.queue.core.windows.net", AccountName)), new Uri(string.Format("http://{0}.table.core.windows.net", AccountName))); } } private static CloudPageBlob GetPageBlobReference(string containerName, string name) { var client = Account.CreateCloudBlobClient(); var container = client.GetContainerReference(containerName); return container.GetPageBlobReference(name); } private static void ResizePageBlob(string containerName, string name, int pages) { //This example has been taken from Azure team's blog, and modified slighly. //Basically it causes the PageBlob to grow or shrink to the specified number of pages. //Specifically the timeout units specified in the blog were wrong const int timeOutInSeconds = 90; var blob = GetPageBlobReference(containerName, name); var requestUri = Account.Credentials.NeedsTransformUri ? new Uri(Account.Credentials.TransformUri(blob.Uri.ToString())) : blob.Uri; var request = BlobRequest.SetProperties(requestUri, timeOutInSeconds, blob.Properties, null, PageByteCount*pages); request.Timeout = timeOutInSeconds*1000; Account.Credentials.SignRequest(request); request.GetResponse(); } private static void DownloadBlob(string containerName, string name) { var blob = GetPageBlobReference(containerName, name); Console.WriteLine("Downloading blob {0}...", name); var sw = System.Diagnostics.Stopwatch.StartNew(); var buffer = blob.DownloadByteArray(); Console.WriteLine("Downloaded {0} bytes ({2} pages), in {1:0.00} s", buffer.Length, sw.ElapsedMilliseconds/1000d, buffer.Length/PageByteCount); } private static void CreatePageBlob(string containerName, string name, int pages) { //Create a blob with a specified number of pages var blob = GetPageBlobReference(containerName, name); blob.Create(pages*PageByteCount); //Fill buffer with random data var buffer = new byte[pages*PageByteCount]; Random.NextBytes(buffer); Console.WriteLine("Uploading normal page blob {0}...", name); //Upload the entire blob var sw = System.Diagnostics.Stopwatch.StartNew(); using (var stream = new MemoryStream(buffer)) blob.WritePages(stream, 0); Console.WriteLine("Uploaded {0:0.0} KB ({2} pages), in {1:0.00} s", buffer.Length/1024d, sw.ElapsedMilliseconds/1000d, buffer.Length/PageByteCount); } private static void CreateFragmentedPageBlob(string containerName, string name, int pages) { var blob = GetPageBlobReference(containerName, name); blob.Create(PageByteCount); for (var i = 0; i < pages; ++i) { Console.Write("Processing page {0} out of {1}...", i + 1, pages); ResizePageBlob(containerName, name, i + 1); var pageBuffer = new byte[PageByteCount]; Random.NextBytes(pageBuffer); var sw = System.Diagnostics.Stopwatch.StartNew(); using (var stream = new MemoryStream(pageBuffer)) blob.WritePages(stream, PageByteCount*i); Console.Write("done ({0} ms)\n", sw.ElapsedMilliseconds); } Console.WriteLine("Done uploading..."); } public static void Write(string str) { Console.SetCursorPosition(0, Console.CursorTop); Console.Write(str.PadRight(Console.WindowWidth, ' ')); } private static void Main() { //Create container Account .CreateCloudBlobClient() .GetContainerReference(ContainerName) .CreateIfNotExist(); //Create two similar blobs CreatePageBlob(ContainerName, SequentialLayoutBlobName, NumberOfPagesToTest); CreateFragmentedPageBlob(ContainerName, FragmentedBlobName, NumberOfPagesToTest); //Observe the download time difference DownloadBlob(ContainerName, SequentialLayoutBlobName); DownloadBlob(ContainerName, FragmentedBlobName); } } }
-
15 สิงหาคม 2555 4:43This is perfect, thank you! I'm giving it a try now and will then see if I can spot anything...
-
15 สิงหาคม 2555 5:17
All my tests were from a VM on my laptop at my house.
My results varied a lot. On the first run, it was 14.4 seconds for "sequential" vs. 16.45 seconds for "fragmented." Second run: 5.44, 6.85. Third run: 3.34, 8.10. (On subsequent runs, I didn't recreate the blobs, just downloaded them.) Fourth: 11.36, 13.02.
Every time the "fragmented" blob is slower, so I decided to switch the order. The first run with the reversed order was similar in result to the previous runs, but the second run gave 1.52 seconds versus 3.24 seconds, the "fragmented" blob being faster. A few more runs showed "sequential" being faster again.
Aside from one test, my results agreed with yours, in that the "fragmented" blob took longer to download, despite being the same size. I'm going to ping some folks on the Windows Azure storage team and see if they can explain what's going on.
-
15 สิงหาคม 2555 5:48
Thanks for checking it out. I've now adjusted my algorithm to expand blob sizes in powers of two, hoping that this will mitigate the issue. I can not preallocate larger chunks, because I fear I might eventually run into the 100 TB account storage limit. I would really like to know what the storage guys suggest I do, and what performance guarantees exist.
-
15 สิงหาคม 2555 7:39
I'm 90% sure that unused pages in your page blobs won't count against your 100TB limit.
-
15 สิงหาคม 2555 17:53
I am seeing same performance issue when growing many blobls in the same container in powers of two. A 1 mb blob takes 10s to download. This is just unacceptable. I will try a different growth strategy, but at this point I'm not sure what good it will do.
Additionally, this is not a resize issue. If you comment out blob resize method in the code sample and create the "fragmented" blob as large as as the "sequential" blob, it will still preform much more poorly when downloaded, if it has been stored one page at a time.
I've also noted that sometimes Azure will return the fragmented blob very fast, if requested immediately after a slow request. I am assuming this is some kind of caching magic going on, in any case it doesn't seem to help my particular use case. -
21 สิงหาคม 2555 4:48
It is not the Create API which is significant here. The fact that smaller fragments were written as against larger chunks may cause the difference in performance you are seeing when downloading blobs.
- ทำเครื่องหมายเป็นคำตอบโดย Arwind - MSFTModerator 3 กันยายน 2555 7:57