locked
Euro encoding in UTF8 RRS feed

  • Question

  • I'm trying to encode the euro symbol for use in an CSV file format.  Unfortunatley Excel does not seem to be able to correctly show the symbol when using UTF8. If I go up to Unicode (UTF16), Excel 2007 correctly displays it, but Excel 2003 does not recognize that encoding.

    I'm using Response.Write to write to an aspx page in the webb so I can't really monkey with the file and/or the write stream. 

    I have changed the encoding on the Content with
    Response.ContentEncoding = new UTF8Encoding(true);
           and also have tried
    Response.ContentEncoding = Systemt.Text.Encoding.UTF8;  // also tried .Unicode;

    Which correctly sets the encoding for the page, but agains my concern in UTF8 which is a requirement.

    As far as the character has gone I've attempted the following lines:
    string currencySymbol = "€";  //displays    €
               and
    string currencySymbol = "" + ((char)128);  //displays    â‚€
               and
    string currencySymbol = "" + ((char)8364);  //displays    €
              and
    string currencySymbol = "\u2034";  //displays    ‴


    It seems to me that the UTF8 encoding parser for Excel is not correctly reading the 3rd byte required for the euro symbol.
    Tuesday, May 20, 2008 2:16 PM

Answers

  • I solved it using something derived from Robert's reply.  I knew what byte I wanted to show up in the OutputStream, and I knew where. I just needed to force it out.  Here's my solution:

    //This is done to correctly display the euro
                byte[] euroByte = new byte[1];
                euroByte[0] = (byte)128;
                bool isAfterFirst = false;
                foreach (string item in printString.Split('€'))
                {
                if (isAfterFirst)
                          {
                          Response.OutputStream.Write(euroByte, 0, 1);
                          }
                          Response.Write(item);
                          Response.Flush();
                          isAfterFirst = true;
               }
               Response.Write("\r\n");


    I haven't cleaned up the implementation and inefficiences yet, but the important bits are shown. 
    I insert the euro as I do all the other currency symbol using the StringBuilder. 
    Split the outputted string from the builder
    Write the first string
    Flush the output stream to ensure that the order is correct
    If there is a second string, write the euro before it, and write the next string (repeat as necessary)
    Finish off the line.



    Wednesday, May 21, 2008 4:40 PM
  • I wrote this:

    protected void BtExport_Click(object sender, EventArgs e) 
        Response.Clear(); 
        Response.AddHeader("content-disposition""attachment; filename=Buoni.csv"); 
        Response.ContentType = "text/csv"
        Response.ContentEncoding = System.Text.Encoding.Default; 
        Response.Write(CreateCSVFromGridView(GWBuoni)); 
        Response.End(); 
     
    string CreateCSVFromGridView(GridView gw) 
        StringBuilder sb = new StringBuilder(); 
        int ColCount = gw.Columns.Count; 
     
        //write header 
        for (int i = 0; i < ColCount; i++) 
        { 
            TableCell cell = gw.HeaderRow.Cells[i]; 
            sb.Append(cell.Text); 
            if (i < ColCount - 1) sb.Append(";"); 
        } 
        sb.AppendLine(); 
     
     
        //write all rows 
        foreach (GridViewRow row in gw.Rows) 
        { 
            for (int i = 0; i < ColCount; i++) 
            { 
                TableCell cell = row.Cells[i]; 
                sb.Append(Server.HtmlDecode(cell.Text)); 
                if (i < ColCount - 1) sb.Append(";"); 
            } 
            sb.AppendLine(); 
        } 
        return sb.ToString(); 

    and it works perfectly.


    ciao!
    • Proposed as answer by Harish Paladugu Tuesday, October 27, 2009 11:44 AM
    • Marked as answer by Caddre Tuesday, October 27, 2009 8:43 PM
    Friday, November 14, 2008 11:03 AM

All replies

  • You're saying you are writing the output to an ASPX file, but you actually need a CSV file. That's a little confusing.

     

    When you write UTF8 output to the browser, you also have to tell the output document that it's using UTF8 by adding the following to the <head> section of the ASPX/HTML file:

     

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

    Tuesday, May 20, 2008 2:34 PM
  • I should have mentioned that I did try adding the charset to the header. Here is a snipped version of page_load:
    DateTime from = Convert.ToDateTime(Request.QueryString["fromDate"]);
              DateTime to = Convert.ToDateTime(Request.QueryString["toDate"]);
              to = to.AddDays(1); //add the last day to get orders from the end date
              OrderStatus orderStatus = OrderStatus.GetById(Nint.Parse( Request.QueryString["statusId"]));
               
               //Set up csv file
               string filename = "attachment; filename=order_download_" + ..date info as name.. + ".csv";
     Response.AddHeader("content-disposition", filename);
     Response.Clear();
     Response.ContentType = "application/text; charset=UTF-8";
     Response.ContentEncoding = new UTF8Encoding(true);
                           
                //Set up header info for csv
                string headerString = "\"Order ID\",";
                headerString += "\"Order Date\",";
                headerString += "\"Order Total\",";
        ....
                Response.Write(headerString);
      Response.Write("\r\n");

                //Write the lines for each order
                string splitter = "\",\"";
                string beginner = "\"";
                string ender = "\"";
              
                foreach (Order order in Order.GetByDateRange(from, to))
                {
                    OrderState os = order.CurrentState;
                    char euro = (char)128;
                    string currencySymbol = order.GetCountryGroup().CurrencySymbol;
                    if (currencySymbol == "€") currencySymbol = "" + euro; //"\u2034";
                    if (os != null)
                    {
                        if (orderStatus == null || os.StatusId == orderStatus.Id)
                        {
                            string orderString = beginner;
                            orderString += order.Id.ToString() + splitter;                      
                  orderString += os.OrderDate.ToString() + splitter;
                  orderString += currencySymbol + os.OriginalTotal.ToString("n") + splitter;
                            ...
                   foreach (OrderItem oi in os.OrderItems)
                             {
                                string itemString = oi.ProductNo + splitter;
                                itemString += oi.ProductName.Replace(((char)174 + ""), "") + splitter;
                                itemString += oi.SKU + splitter;
                                itemString += oi.Quantity + splitter;
                                itemString += currencySymbol + oi.Price.ToString("n") + splitter;
                                itemString += ender;
                                Response.Write(orderString + itemString);
                                Response.Write("\r\n");
                              }
                }
            }
        }
        Response.End();


    This format has worked well for me in the past with not only Excel but many 3rd party software products.
           
    Tuesday, May 20, 2008 4:03 PM
  • There should be no need to change the currency symbol. Just send "€". Response.Write() will handle the conversion just fine.

     

    What happens if you leave out this line?:

     

       if (currencySymbol == "€") currencySymbol = "" + euro; //"\u2034";

    Tuesday, May 20, 2008 4:08 PM
  • I get the pipe and accented a - '€' output.

    SQL does have the correct character stored but using that directly does not work.  (MS SQL 2000 database to match my client's enviroment).
    Tuesday, May 20, 2008 4:37 PM
  • More interesting diagnosis.  When the Response.Write occurs there is no character between the " and the €.  But reading the stream out in UTF8, I see a ASCII character 194 in front of the ASCII character 128.


    Tuesday, May 20, 2008 9:41 PM
  • You really need to use a StringBuilder instead of doing all that inefficient string concatenation.

     

    Setting the ContentEncoding property makes the browser aware of the encoding of the stream, but doesn't do any conversion for you; that you must do yourself.

     

    e.g.

     

    foreach (OrderItem item in os.OrderItems)

    {

      StringBuilder outputText = new StringBuilder();

      // ... Add stuff to outputText via outputText.Append() and outputText.AppendFormat()

      Response.Write(Encoding.Convert(UnicodeEncoding, UTF8Encoding, Encoding.Unicode.GetBytes(outputText.ToString())));

    }

     

    You should also remove the conversion of the euro character and any symbol-related conversions and just output the text that it's in the fields as it appears, the encoding conversion will take care of making it look right to the browser.

     

    Lastly, if all this is doing is outputing a file, you probably want to write an HttpHandler so that your system doesn't go through the unnecessary overhead of processing an aspx file.

    Wednesday, May 21, 2008 12:42 AM
  •  

    The euro symbol is a two-byte character in UTF-8, so you'll see two bytes. You're not seeing ASCII character 194 and ASCII character 128. You're seeing byte value 194 and byte value 128, big difference. This article may be of some help.
    Wednesday, May 21, 2008 12:47 AM
  •  Robert C. Barth wrote:

    Setting the ContentEncoding property makes the browser aware of the encoding of the stream, but doesn't do any conversion for you; that you must do yourself.



    That's not true. Any string you write to the HttpResponse object will be converted to a byte stream using the encoding you specified in the ContentEncoding property.

    In other words, the following two code fragments result in the same output:

    string output = "€";

    Response.ContentEncoding = Encoding.UTF8;
    Response.Write(output);

    and:

    string output = "€";

    Response.ContentEncoding = Encoding.UTF8;
    Response.WriteBytes(Encoding.UTF8.GetBytes(output));



    Wednesday, May 21, 2008 8:07 AM
  • Actually I did misspeak.  I meant to say ANSI characters 194 the A circumflex and 128 the Euro.  I realized the 3rd byte in the mix is what is screwing up the reading of my csv file by Excel, but figured someone somewhere had to run into this problem before.

    Using the Encoding.Convert function did have the same result as set the ContentEncoding.  I had to change the call a bit to use it since GetBytes doesn't play well with a string (needs a char[]) and Write will output the object itself.
    Response.OutputStream.Write(Encoding.Convert(Encoding.Unicode, Encoding.UTF8, Encoding.Unicode.GetBytes(outputText.ToString().ToCharArray())),0,sb.Length+1);
    Not sure if the extra bit at the end was necessary, but wanted it there in case it didn't catch the terminator.
                               

    As far as the StringBuilder v Concatenation, I personally preder the readability of concatenation.  This report download is run once a month and started out much smaller (just Order Id, Order Date, and Order Total).   I do agree with its size now, it would be much more efficient to use StringBuilder.


    I think at this point I'm going to call it Excel's problem since the ouput stream does have the correct 3 bytes in it.  I'll just write an Excel macro to translate the '€' that gets written (where I started originally using the character that comes from the SQL Database) into €.


    Thaank you both for your help.
    Wednesday, May 21, 2008 1:57 PM
  • Refer to http://technet.microsoft.com/en-us/library/bb742422.aspx

     

    "add a Response.Charset property on the second line of your ASP page (Response.Charset="UTF-8") to tell the browser to expect UTF-8 data"

    Wednesday, May 21, 2008 4:37 PM
  • I solved it using something derived from Robert's reply.  I knew what byte I wanted to show up in the OutputStream, and I knew where. I just needed to force it out.  Here's my solution:

    //This is done to correctly display the euro
                byte[] euroByte = new byte[1];
                euroByte[0] = (byte)128;
                bool isAfterFirst = false;
                foreach (string item in printString.Split('€'))
                {
                if (isAfterFirst)
                          {
                          Response.OutputStream.Write(euroByte, 0, 1);
                          }
                          Response.Write(item);
                          Response.Flush();
                          isAfterFirst = true;
               }
               Response.Write("\r\n");


    I haven't cleaned up the implementation and inefficiences yet, but the important bits are shown. 
    I insert the euro as I do all the other currency symbol using the StringBuilder. 
    Split the outputted string from the builder
    Write the first string
    Flush the output stream to ensure that the order is correct
    If there is a second string, write the euro before it, and write the next string (repeat as necessary)
    Finish off the line.



    Wednesday, May 21, 2008 4:40 PM
  • Anonymous:  I had set the charset to utf-8 and that didn't help.  See these lines from above:

    Response.ContentType = "application/text; charset=UTF-8";
    Response.ContentEncoding = new UTF8Encoding(true);


    Wednesday, May 21, 2008 4:46 PM
  • I wrote this:

    protected void BtExport_Click(object sender, EventArgs e) 
        Response.Clear(); 
        Response.AddHeader("content-disposition""attachment; filename=Buoni.csv"); 
        Response.ContentType = "text/csv"
        Response.ContentEncoding = System.Text.Encoding.Default; 
        Response.Write(CreateCSVFromGridView(GWBuoni)); 
        Response.End(); 
     
    string CreateCSVFromGridView(GridView gw) 
        StringBuilder sb = new StringBuilder(); 
        int ColCount = gw.Columns.Count; 
     
        //write header 
        for (int i = 0; i < ColCount; i++) 
        { 
            TableCell cell = gw.HeaderRow.Cells[i]; 
            sb.Append(cell.Text); 
            if (i < ColCount - 1) sb.Append(";"); 
        } 
        sb.AppendLine(); 
     
     
        //write all rows 
        foreach (GridViewRow row in gw.Rows) 
        { 
            for (int i = 0; i < ColCount; i++) 
            { 
                TableCell cell = row.Cells[i]; 
                sb.Append(Server.HtmlDecode(cell.Text)); 
                if (i < ColCount - 1) sb.Append(";"); 
            } 
            sb.AppendLine(); 
        } 
        return sb.ToString(); 

    and it works perfectly.


    ciao!
    • Proposed as answer by Harish Paladugu Tuesday, October 27, 2009 11:44 AM
    • Marked as answer by Caddre Tuesday, October 27, 2009 8:43 PM
    Friday, November 14, 2008 11:03 AM
  • (If I go up to Unicode (UTF16), Excel 2007 correctly displays it, but Excel 2003 does not recognize that encoding.)

    That is because Excel 2003 and below require UTF 7 Encoding to display none English and special characters  Try the code below.



    Response.Charset = "" to
    Response.ContentEncoding =
    System.Text.Encoding.UTF7



    Asp.net MVP, MCPD Web C#, MCTS TFS, MCITP BI and DBA
    Friday, November 14, 2008 6:41 PM
  • Excel 2003 is permanently and unavoidably broken when it comes to direct launch of a file from a browser with unicode characters in a CSV file; however, you can load the CSV file from disk using the File Open dialog, and Excel 2003 will do the right thing. This is a bug in Excel 2003.

    Excel 2007 can be made to work with UTF8 or UTF16.  To get Excel 2007 to interpret the file as UTF-8, you must put a Unicode Byte-order marker (0xFEFF, UTF-8-encoded, of course) in the file. Excel 2007 (and many other applications, as well) will sniff the first three bytes to determine how the file has been encoded. If the byte-order marker is present, Excel can tell from how it was encoded, whether the file is UTF-8, or UTF-16 (and whether the byte order is little- or big-endian).  If there's no byte order marker, Excel assumes that the file is in local codepage.

    Pushing UTF8 and Unicode through a browser download can be slightly challenging. There are multiple conflicting indicators, not least of which is the media type, and character encoding in the HTTP headers. (Response.MediaType and Response.CharacterEncoding, or something similar). The guaranteed way to do it is to push the content as binary-encoded data. But I think if you experiment with setting the HTTP headers, you'll get the right encoding anyway.

    The only real catch, is that you must ALSO push the byte order marker as the first character of the CSV file, in addition to setting the character encoding in the HTTP response.

    The surefire way to verify that you have download the bits properly withot somebody recoding it in transit is to load the resulting file in Notepad. Click on File/Save As... and the Encoding listbox at the bottom of the save dialog will indicate the file type, which will be determined by byte order markers on the saved file.

    Once you have downloaded the file in appropriate format, you're at the mercy of Excel. Excel 2007 will do the right thing, and respect the byte order marker. Excel 2003 will not.

    You might want to try TSV instead. I think TSV works better in Excel 2003.




    Monday, November 17, 2008 6:03 PM
  • You can render Excel 2003 with the code below even in Chinese and Hebrew because I have helped others do it and the users come back and thank me so I am not sure what you are talking about.  If your file size is large or you are working with Excel Workbook these are two separate issues large files require the max length property of Asp.net adjusted and workbooks require pages of code not short response object code.  And Microsoft in Office 2003 and below require Asp.net to just open Office files not accept write which creates some of the problems.

    Response.ContentEncoding =
    System.Text.Encoding.UTF7





    Asp.net MVP, MCPD Web C#, MCTS TFS, MCITP BI and DBA
    Monday, November 17, 2008 6:31 PM
  • Hi Licantrop0,

    Thanks for posting the solution.

    I tried searching a lot and nothing worked but your solution worked like magic.

    I had to do few changes though.

    protected void WriteDataTableToStream(DataTable dtDataToWrite, string dataFileName)
    {
      Response.Clear();
      Response.AddHeader(
    "Content-Disposition", "attachment; filename=" + dataFileName);  
      Response.ContentType =
    "text/csv";
      Response.ContentEncoding = System.Text.
    Encoding.Default;
      Response.Write(CreateCSVFromTable(dtDataToWrite));
      Response.End();
    }

    string
    CreateCSVFromTable(DataTable dt)
    {
      StringBuilder sb = new StringBuilder();
      int ColCount = dt.Columns.Count;

      for
    (int i = 0; i < ColCount; i++)
      {
         object o = dt.Columns[i].ColumnName;
         sb.Append(GetWriteableValue(o));
         if (i < ColCount - 1)  sb.Append(",");
      }
      sb.AppendLine();

      foreach
    (DataRow row in dt.Rows)
      {
        for (int j = 0; j < ColCount; j++) 
        {
           object o = row[j].ToString();
           sb.Append(Server.HtmlDecode(GetWriteableValue(o)));
           if (j < ColCount - 1)  sb.Append(",");
        }
        sb.AppendLine();
      }
      return sb.ToString();
    }

    public
    static string GetWriteableValue(object o)
    {
      if (o == null || o == Convert.DBNull)
         return "";
      else if (o.ToString().IndexOf(",") == -1)
         return o.ToString();
      else
         return "\"" + o.ToString() + "\"";
    }

    Thanks,

    Harish

     

    Tuesday, October 27, 2009 11:47 AM
  • Server.HtmlDecode(cell.Text)); good solution.
    Monday, October 17, 2011 9:47 AM