VENDAS: 1-800-867-1389

 none
Blob service doesn't accept special characters

    Pergunta

  • Hi there,

    We are usign Blob REST API in our product to upload files.

    The doc states about blob names:

     - A blob name can contain any combination of characters, but reserved URL characters must be properly escaped.

    We have problems with uploading files which names contain some special characters, even though they are escaped correctly. We get the following response:

    HTTP/1.1 400 Bad Request
    Content-Type: text/html; charset=us-ascii
    Server: Microsoft-HTTPAPI/2.0
    Date: Wed, 08 Feb 2012 17:11:22 GMT
    Connection: close
    Content-Length: 324
    
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd">
    <HTML><HEAD><TITLE>Bad Request</TITLE>
    <META HTTP-EQUIV="Content-Type" Content="text/html; charset=us-ascii"></HEAD>
    <BODY><h2>Bad Request - Invalid URL</h2>
    <hr><p>HTTP Error 400. The request URL is invalid.</p>
    </BODY></HTML>

    Here's a list of ones we figured out that cause problems:

    • 0x0081 - (encoded to UTF8 %C2%81 )
    • 0x007F - (encoded to UTF8 %7F )
    • 0xE000 - (encoded to UTF8  %EE%80%80 ) - with this one situation is strange. When it's alone we can upload such flie, but if it's followed bya  chinese character, it depends:
    PUT /sync/%EE%80%80.txt HTTP/1.1 <--- success
    
    PUT /sync/%EE%80%80%E4%A1%90.txt HTTP/1.1 < --- fails (the filename: 䡐.txt)
    
    PUT /sync/%EE%80%80%E7%BF%BD.txt HTTP/1.1 <--- success (the filename: 翽.txt)
    

    I don't know how such filenames appeared on the client's system, but they a valid unicode filenames and Windows Explorer, Notepad, etc. handle them well.

    Is this a bug, or documented somewhere? Why there's inconsietency when working with 0xE000 character?

    Thanks,

    IP

    • Editado IvanP_CBL quarta-feira, 8 de fevereiro de 2012 17:22 text formatting
    quarta-feira, 8 de fevereiro de 2012 17:18

Respostas

  • Hi - thanks for bringing this issue up.  We've confirmed the issue occurs with certain unicode characters and pairs of unicode chars.  Unfortunately, we are unable to provide a fix on our servers at this time, but will be working to see how we can address it in the future.  For now, we suggest handling the issue on the client side and avoiding the use of chars that cause the failure.

    Again, thanks for informing us of the issue!


    -Jeff


    quinta-feira, 16 de fevereiro de 2012 02:22

Todas as Respostas

  • Hi,

    I just test your file name at my side, i use Cloud storage account for testing and find the azure storage can contains your special characters. Do you test your code in local with Azure Storage Emulator? The Dev Storage will store the data in Sql Server express locally, so i guess the Sql Server can not contain these special charaters cause this problem, but if you deploy your application on the cloud, i think it can works fine, here i share my code and screenshots below:

                    CloudStorageAccount account = new CloudStorageAccount(new StorageCredentialsAccountAndKey("your account", "your key"), false);
                    CloudBlobClient client = account.CreateCloudBlobClient();
                    CloudBlobContainer container = client.GetContainerReference("test");
                    container.CreateIfNotExist();
    
                    var permission = container.GetPermissions();
                    permission.PublicAccess = BlobContainerPublicAccessType.Container;
                    container.SetPermissions(permission);
    
                    CloudBlob blob = container.GetBlobReference("䡐");
                    //CloudBlob blob = container.GetBlobReference("%EE%80%80%E4%A1%90");
    
    
                    blob.UploadFile(Server.MapPath("Styles/%EE%80%80%E4%A1%90.txt"));

    Screenshot

    Hope it can help you.


    Please mark the replies as answers if they help or unmark if not. If you have any feedback about my replies, please contact msdnmg@microsoft.com Microsoft One Code Framework

    quinta-feira, 9 de fevereiro de 2012 11:48
    Moderador
  • Hi Arwind,

    Thanks for your answer, but the unsupported character in the blob name in your code is missed (myabe due to browser copy).

    Here's the snippet for you to try:

                CloudBlob blob1 = container.GetBlobReference("\uE000\u7FFD");   // <-- Success
                CloudBlob blob2 = container.GetBlobReference("\uE000\u4850");   // <-- causes Bad Request on upload
                CloudBlob blob3 = container.GetBlobReference("\u007F");         // <-- causes Bad Request on upload
                CloudBlob blob4 = container.GetBlobReference("\u0081");         // <-- causes Bad Request on upload

                blob1.UploadFile(@"D:\temp\xxxx.txt"); // <-- Success
                blob2.UploadFile(@"D:\temp\xxxx.txt"); // <-- Fail
                blob3.UploadFile(@"D:\temp\xxxx.txt"); // <-- Fail
                blob4.UploadFile(@"D:\temp\xxxx.txt"); // <-- Fail

    (\u4850 - 䡐 ,  and \u7FFD - 翽)

    Note that these all are valid characters in Windows file system and exist in filenames on our client's computer.

    We are uploading to the cloud, not to the dev storage (though I dont' see any dev storage blob names limitations)

    Sincerely,

    IP


    • Editado IvanP_CBL quinta-feira, 9 de fevereiro de 2012 14:36
    quinta-feira, 9 de fevereiro de 2012 14:34
  • Hi,

    I can reproduce your issue, let me loop them to Azure team, it will be a short delay, appreciate your patience.

    Thanks


    Please mark the replies as answers if they help or unmark if not. If you have any feedback about my replies, please contact msdnmg@microsoft.com Microsoft One Code Framework

    sexta-feira, 10 de fevereiro de 2012 02:41
    Moderador
  • Hi - thanks for bringing this issue up.  We've confirmed the issue occurs with certain unicode characters and pairs of unicode chars.  Unfortunately, we are unable to provide a fix on our servers at this time, but will be working to see how we can address it in the future.  For now, we suggest handling the issue on the client side and avoiding the use of chars that cause the failure.

    Again, thanks for informing us of the issue!


    -Jeff


    quinta-feira, 16 de fevereiro de 2012 02:22
  • One more update.  We have found that the issue is actually "by design", as the characters and character pairs are invalid for use in an URL, according to HTTP1.1.  I've included links to the RFC's describing valid URL chars - they're kind of long, so the summary is simply that some chars are valid for use as file names, but not valid as HTTP1.1 URL's.  As above, I suggest detecting/handling the issue on the client side and avoiding the use of characters that cause the failure.

    The valid URL characters are defined here:
    Basic Rules (2.2)
    Unicode range

    -Jeff

    terça-feira, 6 de março de 2012 00:21
  • Hi Jeff!

    Thanks for the update. I don't see any reasons why 0x7F being encoded to %7F is not a valid URL path part (even for IRIs)

    While reading RFCs I cannot also see that when properly encoded the characters are not valid for URLs.

    I tried System.Net.Uri class with turning app.config option <iriParsing enabled="true" /> , so that it behaves according to RFC3987 (as MSDN states), and the names are just automatically encoded to the same sequences:

    PUT /sync/%7F.txt HTTP/1.1
    PUT /sync/%EE%80%80%E4%A1%90.txt HTTP/1.1
    PUT /sync/%EE%80%80%E7%BF%BD.txt HTTP/1.1
    And executing HttpWebRequest.GetResponse() on these Uris  (which tend to be RFC3987-compliant) give same results: fail for %7F, success for %EE%80%80%E4%A1%90 and fail for %EE%80%80%E7%BF%BD

    The other note is that the Blob Service doc says that if name violates the rules it will response with 400 Bad Request. And I cannot see the difference between the last two requests in terms of RFC3897, so they both must either work, or both must fail.

    I cannot handle filenames before sending until I know the way to identify the set of unsupported characters/sequences (or until clients complain they cannot upload the files with our software). What are the unsupported characters/sequences?

    Sincerely,

    IP

    • Editado IvanP_CBL terça-feira, 13 de março de 2012 12:31
    segunda-feira, 12 de março de 2012 18:12
  • Hi,

    Are there any update on this issue, because after two years we also experience it in our production environment.

    We are using Windows Azure Storage 3.0 package for .NET.

    Best regards,

    Ievgen Baida


    • Editado Ievgen Baida terça-feira, 15 de abril de 2014 13:08
    terça-feira, 15 de abril de 2014 13:06