locked
How would I copy a blob container full of a large amount of data from one storage account to another. RRS feed

  • Question

  • We have a process in an Azure worker role that takes a long time, a day or so, to process data. The data is inside a blob container in a storage account and we would quickly like to move this data, all 12 gigs, to a blob container in another storage account very quickly in a time critical manner without rerunning the whole process. What is the fastest way to do this?

    Thanks,

    John Grant

    N-Play Application Developer


    Keith Peoples

    Tuesday, August 14, 2012 11:30 PM

Answers

  • I wonder why this isn't just built into the portal itself?  I've often wondered why we can't just copy things across...  For example recreating service bus queues or containers...  Would be much nicer if we could just "copy and paste" things like that as part of the portal functionality.

    Thanks,

    Eric

    • Marked as answer by Arwind - MSFT Monday, September 3, 2012 7:56 AM
    Friday, August 17, 2012 6:48 PM
  • Steve - in the latest storage REST version, we removed the 1-hour limit (users of this - be careful, they cannot be revoked!) for SAS that does not use a container level access policy (the preferred method for SAS that may need to be revoked).

    Eric - thanks for the feature request!


    -Jeff

    • Marked as answer by Arwind - MSFT Monday, September 3, 2012 7:56 AM
    Friday, August 17, 2012 6:56 PM

All replies

  • There's an API method for copying blobs these days. You'll have to call it once per blob, and it's an async process. How long it will take probably depends on whether the accounts are in the same geographic location. I don't have any experience yet doing it, so I won't guess as to the speed.

    See http://msdn.microsoft.com/en-us/library/windowsazure/dd894037.aspx and http://msdn.microsoft.com/en-us/library/windowsazure/microsoft.windowsazure.storageclient.cloudblob.copyfromblob. It should just be a foreach loop over the source and an inner call to CopyFromBlob.

    Wednesday, August 15, 2012 12:25 AM
  • Adding to what Steve said above, I wrote a blog post which uses this functionality to copy an object from Amazon S3 to Windows Azure Blob Storage which you can read here: http://gauravmantri.com/2012/06/14/how-to-copy-an-object-from-amazon-s3-to-windows-azure-blob-storage-using-copy-blob/. Based on our testing, normally within same geographic regions, it takes only a few minutes (less than 5) to copy huge amount of data (we copied around 70 GB).

    In your scenario, like Steve said you would need to enumerate blobs in the container and then copy using this async copy blob functionality.

    If you're looking for a tool to do so, please take a look at Cloud Storage Studio (http://www.cerebrata.com/Products/CloudStorageStudio/WhatsNew.aspx). The latest version we released a few days ago supports this functionality.

    Hope this helps.

    Wednesday, August 15, 2012 3:05 AM
  • This is the point we are at. We keep getting this error: Exception: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature. I am using the sample copy method from the blog post almost verbatim. This is run from my local machine outside the datacenter on a container with three small images just for testing purposes. Does anyone have any suggestions?

     

    C:\Users\jgrant\Documents\Visual Studio 2010\Projects\NPlay.BlobTools\BlobCopy\bin\Debug>BlobCopy CopyBlobs "http://nplaydebug.blob.core.windows.net/test-copy-container" "http://nplaytest.blob.core.windows.net/test-copy-container" "nplaydebug" "HIDDENKEY "nplaytest" "HIDDENKEY"

    Exception: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.

    Details: Microsoft.WindowsAzure.StorageClient.StorageClientException: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature

    . ---> System.Net.WebException: The remote server returned an error: (403) Forbidden.

       at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult)

       at Microsoft.WindowsAzure.StorageClient.EventHelper.ProcessWebResponse(WebRequest req, IAsyncResult asyncResult, EventHandler`1 handler, Object sender) in C:\Users\jgrant\Downloads\WindowsAzure-azu

    re-sdk-for-net-v1.7-June2012-3-g138464b\WindowsAzure-azure-sdk-for-net-138464b\microsoft-azure-api\StorageClient\EventHelper.cs:line 77

       --- End of inner exception stack trace ---

       at Microsoft.WindowsAzure.StorageClient.Tasks.Task`1.get_Result() in C:\Users\jgrant\Downloads\WindowsAzure-azure-sdk-for-net-v1.7-June2012-3-g138464b\WindowsAzure-azure-sdk-for-net-138464b\microso

    ft-azure-api\StorageClient\Tasks\Task.cs:line 103

       at Microsoft.WindowsAzure.StorageClient.Tasks.Task`1.ExecuteAndWait() in C:\Users\jgrant\Downloads\WindowsAzure-azure-sdk-for-net-v1.7-June2012-3-g138464b\WindowsAzure-azure-sdk-for-net-138464b\mic

    rosoft-azure-api\StorageClient\Tasks\Task.cs:line 171

       at Microsoft.WindowsAzure.StorageClient.TaskImplHelper.ExecuteImplWithRetry[T](Func`2 impl, RetryPolicy policy) in C:\Users\jgrant\Downloads\WindowsAzure-azure-sdk-for-net-v1.7-June2012-3-g138464b\

    WindowsAzure-azure-sdk-for-net-138464b\microsoft-azure-api\StorageClient\TaskImplHelper.cs:line 135

       at Microsoft.WindowsAzure.StorageClient.CloudBlob.StartCopyFromBlob(Uri source, AccessCondition sourceAccessCondition, AccessCondition destAccessCondition, BlobRequestOptions options) in C:\Users\j

    grant\Downloads\WindowsAzure-azure-sdk-for-net-v1.7-June2012-3-g138464b\WindowsAzure-azure-sdk-for-net-138464b\microsoft-azure-api\StorageClient\CloudBlob.cs:line 755

       at Microsoft.WindowsAzure.StorageClient.CloudBlob.StartCopyFromBlob(Uri source) in C:\Users\jgrant\Downloads\WindowsAzure-azure-sdk-for-net-v1.7-June2012-3-g138464b\WindowsAzure-azure-sdk-for-net-1

    38464b\microsoft-azure-api\StorageClient\CloudBlob.cs:line 732

       at NPlay.BlobTools.BlobCopy.CopyBlobs(CloudBlobContainer srcContainer, CloudBlobContainer destContainer) in C:\Users\jgrant\Documents\visual studio 2010\Projects\NPlay.BlobTools\BlobCopy\BlobCopy.c

    s:line 96

       at NPlay.BlobTools.BlobCopy.Main(String[] args) in C:\Users\jgrant\Documents\visual studio 2010\Projects\NPlay.BlobTools\BlobCopy\BlobCopy.cs:line 48

     

    C:\Users\jgrant\Documents\Visual Studio 2010\Projects\NPlay.BlobTools\BlobCopy\bin\Debug>


    N-Play Team

    Wednesday, August 15, 2012 6:55 PM
  • And the code used in the program:

    class BlobCopy
            {
                private static dynamic context = new ListingContext("", null);

                static void Main(string[] args)
                {
                    try
                    {
                        string usage = string.Format("Possible Usages:\n"
                        + "BlobCopy CopyBlobs account1SourceContainer account2SourceContainer account1Name account1Key account2Name account2Key\n"
                        );

                        if (args.Length < 1)
                            throw new ApplicationException(usage);

                        int p = 1;

                        switch (args[0])
                        {
                            case "CopyBlobs":
                                if (args.Length != 7) throw new ApplicationException(usage);
                                var Storage1Container = args[p++];
                                var Storage2Container = args[p++];
                                var Storage1Name = args[p++];
                                var Storage1Key = args[p++];
                                var Storage2Name = args[p++];
                                var Storage2Key = args[p++];
                                var Storage1Credentials = new StorageCredentialsAccountAndKey(Storage1Name, Storage1Key);
                                var Storage2Credentials = new StorageCredentialsAccountAndKey(Storage2Name, Storage2Key);
                                var sourceContainer = new CloudBlobContainer(Storage1Container, Storage1Credentials);
                                var destinationContainer = new CloudBlobContainer(Storage2Container, Storage2Credentials);
                                CopyBlobs(sourceContainer,destinationContainer);
                                break;

                            default:
                                throw new ApplicationException(usage);
                        }

                        Console.BackgroundColor = ConsoleColor.Black;
                        Console.ForegroundColor = ConsoleColor.Yellow;
                        Console.WriteLine("OK");
                        Console.ResetColor();
                    }
                    catch (Exception ex)
                    {                    
                        Console.WriteLine("Details: {0}", ex);
                    }
                }

                public static void CopyBlobs(CloudBlobContainer srcContainer,CloudBlobContainer destContainer)
                {
                    // get the SAS token to use for all blobs
                    string blobToken = srcContainer.GetSharedAccessSignature(new SharedAccessBlobPolicy());


                    var srcBlobList = srcContainer.ListBlobs(true, BlobListingDetails.None);
                    foreach (var src in srcBlobList)
                    {
                        var srcBlob = src as CloudBlob;

                        // Create appropriate destination blob type to match the source blob
                        CloudBlob destBlob;
                        if (srcBlob.Properties.BlobType == BlobType.BlockBlob)
                        {
                            destBlob = destContainer.GetBlockBlobReference(srcBlob.Name);
                        }
                        else
                        {
                            destBlob = destContainer.GetPageBlobReference(srcBlob.Name);
                        }

                        // copy using src blob as SAS
                        destBlob.StartCopyFromBlob(new Uri(srcBlob.Uri.AbsoluteUri + blobToken));
                    }
                }
            }


    N-Play Team

    Wednesday, August 15, 2012 6:58 PM
  • I think there's an issue with the following line of code in your CopyBlobs() function:

    string blobToken = srcContainer.GetSharedAccessSignature(new SharedAccessBlobPolicy());

    You need to have at least "Read" permissions on the source blob.

    Can you try it with something like this:

                srcContainer.GetSharedAccessSignature(new SharedAccessPolicy()
                    {
                        Permissions = SharedAccessPermissions.Read,
                        SharedAccessExpiryTime = DateTime.UtcNow.AddDays(1),
                    });
    
     

    Hope this helps.

    Thanks

    Gaurav


    Wednesday, August 15, 2012 7:04 PM
  • Hi - thanks for the question - did Gaurav's solution help?  He's right that the shared access signature needs to provide at least "Read" access for this to work.  One note, given that you mentioned "a lot" of blobs - you might use an expiration longer than a day, and include some code to check to see if the async copy completed.  Especially if you queue a very large amount of data, it could very well take more than a day to copy everything, and you might see some of the blob copies fail in that case if your expiration is too short.

    Of course, a Shared Access Signature created in this manner can't be revoked without regenerating your account keys, so using a long expiration should not be used in all circumstances.  In this one, though - it may give you greater success if your data set is very large.

    Let us know if you need anything else!


    -Jeff

    Friday, August 17, 2012 5:29 PM
  • I thought Shared Access Signatures that don't use a container-level access policy can only have a maximum duration of one hour.

    So if you make a SAS that lasts for a day, it has to be done in such a way that you can later revoke access.

    Friday, August 17, 2012 5:50 PM
  • I wonder why this isn't just built into the portal itself?  I've often wondered why we can't just copy things across...  For example recreating service bus queues or containers...  Would be much nicer if we could just "copy and paste" things like that as part of the portal functionality.

    Thanks,

    Eric

    • Marked as answer by Arwind - MSFT Monday, September 3, 2012 7:56 AM
    Friday, August 17, 2012 6:48 PM
  • Steve - in the latest storage REST version, we removed the 1-hour limit (users of this - be careful, they cannot be revoked!) for SAS that does not use a container level access policy (the preferred method for SAS that may need to be revoked).

    Eric - thanks for the feature request!


    -Jeff

    • Marked as answer by Arwind - MSFT Monday, September 3, 2012 7:56 AM
    Friday, August 17, 2012 6:56 PM
  • Jeff, thanks! I'd missed that update.
    Friday, August 17, 2012 11:09 PM