Blob service doesn't accept special characters
-
quarta-feira, 8 de fevereiro de 2012 17:18
Hi there,
We are usign Blob REST API in our product to upload files.
The doc states about blob names:
- A blob name can contain any combination of characters, but reserved URL characters must be properly escaped.
We have problems with uploading files which names contain some special characters, even though they are escaped correctly. We get the following response:
HTTP/1.1 400 Bad Request Content-Type: text/html; charset=us-ascii Server: Microsoft-HTTPAPI/2.0 Date: Wed, 08 Feb 2012 17:11:22 GMT Connection: close Content-Length: 324 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd"> <HTML><HEAD><TITLE>Bad Request</TITLE> <META HTTP-EQUIV="Content-Type" Content="text/html; charset=us-ascii"></HEAD> <BODY><h2>Bad Request - Invalid URL</h2> <hr><p>HTTP Error 400. The request URL is invalid.</p> </BODY></HTML>
Here's a list of ones we figured out that cause problems:
- 0x0081 - (encoded to UTF8 %C2%81 )
- 0x007F - (encoded to UTF8 %7F )
- 0xE000 - (encoded to UTF8 %EE%80%80 ) - with this one situation is strange. When it's alone we can upload such flie, but if it's followed bya chinese character, it depends:
PUT /sync/%EE%80%80.txt HTTP/1.1 <--- success PUT /sync/%EE%80%80%E4%A1%90.txt HTTP/1.1 < --- fails (the filename: 䡐.txt) PUT /sync/%EE%80%80%E7%BF%BD.txt HTTP/1.1 <--- success (the filename: 翽.txt)
I don't know how such filenames appeared on the client's system, but they a valid unicode filenames and Windows Explorer, Notepad, etc. handle them well.Is this a bug, or documented somewhere? Why there's inconsietency when working with 0xE000 character?
Thanks,
IP
- Editado IvanP_CBL quarta-feira, 8 de fevereiro de 2012 17:22 text formatting
Todas as Respostas
-
quinta-feira, 9 de fevereiro de 2012 11:48Moderador
Hi,
I just test your file name at my side, i use Cloud storage account for testing and find the azure storage can contains your special characters. Do you test your code in local with Azure Storage Emulator? The Dev Storage will store the data in Sql Server express locally, so i guess the Sql Server can not contain these special charaters cause this problem, but if you deploy your application on the cloud, i think it can works fine, here i share my code and screenshots below:
CloudStorageAccount account = new CloudStorageAccount(new StorageCredentialsAccountAndKey("your account", "your key"), false); CloudBlobClient client = account.CreateCloudBlobClient(); CloudBlobContainer container = client.GetContainerReference("test"); container.CreateIfNotExist(); var permission = container.GetPermissions(); permission.PublicAccess = BlobContainerPublicAccessType.Container; container.SetPermissions(permission); CloudBlob blob = container.GetBlobReference("䡐"); //CloudBlob blob = container.GetBlobReference("%EE%80%80%E4%A1%90"); blob.UploadFile(Server.MapPath("Styles/%EE%80%80%E4%A1%90.txt"));
Hope it can help you.
Please mark the replies as answers if they help or unmark if not. If you have any feedback about my replies, please contact msdnmg@microsoft.com Microsoft One Code Framework
-
quinta-feira, 9 de fevereiro de 2012 14:34
Hi Arwind,
Thanks for your answer, but the unsupported character in the blob name in your code is missed (myabe due to browser copy).
Here's the snippet for you to try:
CloudBlob blob1 = container.GetBlobReference("\uE000\u7FFD"); // <-- Success
CloudBlob blob2 = container.GetBlobReference("\uE000\u4850"); // <-- causes Bad Request on upload
CloudBlob blob3 = container.GetBlobReference("\u007F"); // <-- causes Bad Request on upload
CloudBlob blob4 = container.GetBlobReference("\u0081"); // <-- causes Bad Request on upload
blob1.UploadFile(@"D:\temp\xxxx.txt"); // <-- Success
blob2.UploadFile(@"D:\temp\xxxx.txt"); // <-- Fail
blob3.UploadFile(@"D:\temp\xxxx.txt"); // <-- Fail
blob4.UploadFile(@"D:\temp\xxxx.txt"); // <-- Fail(\u4850 - 䡐 , and \u7FFD - 翽)
Note that these all are valid characters in Windows file system and exist in filenames on our client's computer.
We are uploading to the cloud, not to the dev storage (though I dont' see any dev storage blob names limitations)
Sincerely,
IP
- Editado IvanP_CBL quinta-feira, 9 de fevereiro de 2012 14:36
-
sexta-feira, 10 de fevereiro de 2012 02:41Moderador
Hi,
I can reproduce your issue, let me loop them to Azure team, it will be a short delay, appreciate your patience.
Thanks
Please mark the replies as answers if they help or unmark if not. If you have any feedback about my replies, please contact msdnmg@microsoft.com Microsoft One Code Framework
-
quinta-feira, 16 de fevereiro de 2012 02:22
Hi - thanks for bringing this issue up. We've confirmed the issue occurs with certain unicode characters and pairs of unicode chars. Unfortunately, we are unable to provide a fix on our servers at this time, but will be working to see how we can address it in the future. For now, we suggest handling the issue on the client side and avoiding the use of chars that cause the failure.
Again, thanks for informing us of the issue!
-Jeff
- Editado Jeff IrwinMicrosoft Employee quinta-feira, 16 de fevereiro de 2012 02:23
- Marcado como Resposta Arwind - MSFTModerator segunda-feira, 27 de fevereiro de 2012 08:32
-
terça-feira, 6 de março de 2012 00:21
One more update. We have found that the issue is actually "by design", as the characters and character pairs are invalid for use in an URL, according to HTTP1.1. I've included links to the RFC's describing valid URL chars - they're kind of long, so the summary is simply that some chars are valid for use as file names, but not valid as HTTP1.1 URL's. As above, I suggest detecting/handling the issue on the client side and avoiding the use of characters that cause the failure.
The valid URL characters are defined here:
Basic Rules (2.2)
Unicode range-Jeff
-
segunda-feira, 12 de março de 2012 18:12
Hi Jeff!
Thanks for the update. I don't see any reasons why 0x7F being encoded to %7F is not a valid URL path part (even for IRIs)
While reading RFCs I cannot also see that when properly encoded the characters are not valid for URLs.
I tried System.Net.Uri class with turning app.config option <iriParsing enabled="true" /> , so that it behaves according to RFC3987 (as MSDN states), and the names are just automatically encoded to the same sequences:
PUT /sync/%7F.txt HTTP/1.1 PUT /sync/%EE%80%80%E4%A1%90.txt HTTP/1.1 PUT /sync/%EE%80%80%E7%BF%BD.txt HTTP/1.1
And executing HttpWebRequest.GetResponse() on these Uris (which tend to be RFC3987-compliant) give same results: fail for %7F, success for %EE%80%80%E4%A1%90 and fail for %EE%80%80%E7%BF%BDThe other note is that the Blob Service doc says that if name violates the rules it will response with 400 Bad Request. And I cannot see the difference between the last two requests in terms of RFC3897, so they both must either work, or both must fail.
I cannot handle filenames before sending until I know the way to identify the set of unsupported characters/sequences (or until clients complain they cannot upload the files with our software). What are the unsupported characters/sequences?
Sincerely,
IP

