locked
Uploaded BlockBlob in UTF-8 format but downloads as ANSI RRS feed

  • Question

  • Hi,

    I'm uploading some UTF-8 encoded CSV files to Azure Storage using the API. When using the SDK I'm explicitly setting ContentType and using UTF-8.

            outputBlob.Properties.ContentType = "text/html; charset=utf-8";
            outputBlob.UploadText(content, new UTF8Encoding(true));

    When I use Azure Storage Explorer or CyberDuck to download the files and inspect them with Notepad++ it tells me they are in ANSI format and the chars are corrupt.

    How do I overcome this and ensure the files are treated as UTF-8?

    Thanks,

    Mike

    Thursday, April 16, 2015 9:26 AM

Answers

  • I'm not sure about enforcing the UTF-8 with BOM encoding. However, I know that when you upload a file with the  content type specified, Azure remains the file in same and also while the file is download the file will be same.

    The program you use to read these text/csv files, decided to interpret one as UTF-8 or ANSI. Probably, you may have to convert them the way you know to fix the issue.

    Regards,

    Manu

    Friday, April 17, 2015 7:12 PM

All replies

  • Hi Mike,

    Please check the "content type" of the file stored in your blob through Azure Storage Explorer if it is in the same format as you uploaded.

    Also, as you are referring CSV files, use the below line:

    outputBlob.Properties.ContentType = "text/csv; charset=utf-8";
    outputBlob.UploadText(content, new UTF8Encoding(true));

    Regards,

    Manu

    Thursday, April 16, 2015 4:49 PM
  • Hi,

    Yes the ContentType displays as expected in Azure Storage Explorer. After more investigation it seems some files open in Notepad++ and report ANSI whereas others with special characters report UTF-8 without BOM.

    I'm wondering if UTF-8 without BOM can't be detected by NotePad++ unless the file contains special characters, and is otherwise assumed to be ANSI?

    The lost of the BOM is a small issue as without it Excel opens the CSV with corrupt characters. Converting the file to UTF-8 with BOM fixes it. Is there a way to force the UTF-8 with BOM encoding?

    Thanks,

    Mike


    • Edited by TheCodeKing Thursday, April 16, 2015 5:13 PM
    Thursday, April 16, 2015 5:13 PM
  • I'm not sure about enforcing the UTF-8 with BOM encoding. However, I know that when you upload a file with the  content type specified, Azure remains the file in same and also while the file is download the file will be same.

    The program you use to read these text/csv files, decided to interpret one as UTF-8 or ANSI. Probably, you may have to convert them the way you know to fix the issue.

    Regards,

    Manu

    Friday, April 17, 2015 7:12 PM
  • I am also facing similar issue where in I have uploaded a file to a block blob. When I download the file through storage explorer it has valid contents but when I download it using C# code the characters are corrupted.

    CloudBlockBlob blob = (CloudBlockBlob)item;
                            string blobcontent = string.Empty;                      
                            using (WebClient client = new WebClient())
                            {
                                blobcontent = client.DownloadString(blob.Uri);
                            }
     

    Here some of blob contents ( which have special characters) got corrupted.

    Any suggestions?

    Thursday, November 29, 2018 8:16 AM