08 Februari 2012 17:18
We are usign Blob REST API in our product to upload files.
The doc states about blob names:
- A blob name can contain any combination of characters, but reserved URL characters must be properly escaped.
We have problems with uploading files which names contain some special characters, even though they are escaped correctly. We get the following response:
HTTP/1.1 400 Bad Request Content-Type: text/html; charset=us-ascii Server: Microsoft-HTTPAPI/2.0 Date: Wed, 08 Feb 2012 17:11:22 GMT Connection: close Content-Length: 324 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd"> <HTML><HEAD><TITLE>Bad Request</TITLE> <META HTTP-EQUIV="Content-Type" Content="text/html; charset=us-ascii"></HEAD> <BODY><h2>Bad Request - Invalid URL</h2> <hr><p>HTTP Error 400. The request URL is invalid.</p> </BODY></HTML>
Here's a list of ones we figured out that cause problems:
- 0x0081 - (encoded to UTF8 %C2%81 )
- 0x007F - (encoded to UTF8 %7F )
- 0xE000 - (encoded to UTF8 %EE%80%80 ) - with this one situation is strange. When it's alone we can upload such flie, but if it's followed bya chinese character, it depends:
PUT /sync/%EE%80%80.txt HTTP/1.1 <--- success PUT /sync/%EE%80%80%E4%A1%90.txt HTTP/1.1 < --- fails (the filename: 䡐.txt) PUT /sync/%EE%80%80%E7%BF%BD.txt HTTP/1.1 <--- success (the filename: 翽.txt)
I don't know how such filenames appeared on the client's system, but they a valid unicode filenames and Windows Explorer, Notepad, etc. handle them well.
Is this a bug, or documented somewhere? Why there's inconsietency when working with 0xE000 character?
- Diedit oleh IvanP_CBL 08 Februari 2012 17:22 text formatting
09 Februari 2012 11:48Moderator
I just test your file name at my side, i use Cloud storage account for testing and find the azure storage can contains your special characters. Do you test your code in local with Azure Storage Emulator? The Dev Storage will store the data in Sql Server express locally, so i guess the Sql Server can not contain these special charaters cause this problem, but if you deploy your application on the cloud, i think it can works fine, here i share my code and screenshots below:
CloudStorageAccount account = new CloudStorageAccount(new StorageCredentialsAccountAndKey("your account", "your key"), false); CloudBlobClient client = account.CreateCloudBlobClient(); CloudBlobContainer container = client.GetContainerReference("test"); container.CreateIfNotExist(); var permission = container.GetPermissions(); permission.PublicAccess = BlobContainerPublicAccessType.Container; container.SetPermissions(permission); CloudBlob blob = container.GetBlobReference("䡐"); //CloudBlob blob = container.GetBlobReference("%EE%80%80%E4%A1%90"); blob.UploadFile(Server.MapPath("Styles/%EE%80%80%E4%A1%90.txt"));
Hope it can help you.
09 Februari 2012 14:34
Thanks for your answer, but the unsupported character in the blob name in your code is missed (myabe due to browser copy).
Here's the snippet for you to try:
CloudBlob blob1 = container.GetBlobReference("\uE000\u7FFD"); // <-- Success
CloudBlob blob2 = container.GetBlobReference("\uE000\u4850"); // <-- causes Bad Request on upload
CloudBlob blob3 = container.GetBlobReference("\u007F"); // <-- causes Bad Request on upload
CloudBlob blob4 = container.GetBlobReference("\u0081"); // <-- causes Bad Request on upload
blob1.UploadFile(@"D:\temp\xxxx.txt"); // <-- Success
blob2.UploadFile(@"D:\temp\xxxx.txt"); // <-- Fail
blob3.UploadFile(@"D:\temp\xxxx.txt"); // <-- Fail
blob4.UploadFile(@"D:\temp\xxxx.txt"); // <-- Fail
(\u4850 - 䡐 , and \u7FFD - 翽)
Note that these all are valid characters in Windows file system and exist in filenames on our client's computer.
We are uploading to the cloud, not to the dev storage (though I dont' see any dev storage blob names limitations)
- Diedit oleh IvanP_CBL 09 Februari 2012 14:36
10 Februari 2012 2:41Moderator
16 Februari 2012 2:22
Hi - thanks for bringing this issue up. We've confirmed the issue occurs with certain unicode characters and pairs of unicode chars. Unfortunately, we are unable to provide a fix on our servers at this time, but will be working to see how we can address it in the future. For now, we suggest handling the issue on the client side and avoiding the use of chars that cause the failure.
Again, thanks for informing us of the issue!
06 Maret 2012 0:21
One more update. We have found that the issue is actually "by design", as the characters and character pairs are invalid for use in an URL, according to HTTP1.1. I've included links to the RFC's describing valid URL chars - they're kind of long, so the summary is simply that some chars are valid for use as file names, but not valid as HTTP1.1 URL's. As above, I suggest detecting/handling the issue on the client side and avoiding the use of characters that cause the failure.The valid URL characters are defined here:
Basic Rules (2.2)
12 Maret 2012 18:12
Thanks for the update. I don't see any reasons why 0x7F being encoded to %7F is not a valid URL path part (even for IRIs)
While reading RFCs I cannot also see that when properly encoded the characters are not valid for URLs.
I tried System.Net.Uri class with turning app.config option <iriParsing enabled="true" /> , so that it behaves according to RFC3987 (as MSDN states), and the names are just automatically encoded to the same sequences:
PUT /sync/%7F.txt HTTP/1.1 PUT /sync/%EE%80%80%E4%A1%90.txt HTTP/1.1 PUT /sync/%EE%80%80%E7%BF%BD.txt HTTP/1.1And executing HttpWebRequest.GetResponse() on these Uris (which tend to be RFC3987-compliant) give same results: fail for %7F, success for %EE%80%80%E4%A1%90 and fail for %EE%80%80%E7%BF%BD
The other note is that the Blob Service doc says that if name violates the rules it will response with 400 Bad Request. And I cannot see the difference between the last two requests in terms of RFC3897, so they both must either work, or both must fail.
I cannot handle filenames before sending until I know the way to identify the set of unsupported characters/sequences (or until clients complain they cannot upload the files with our software). What are the unsupported characters/sequences?