none
converting a string from UTF-8 to ASCII or ANSI RRS feed

  • Question

  • Hi!

    I have a problem converting a string from UTF-8 to ASCII or ANSI

    Here is the String:

    "Auspuffanlage "Century" f├╝r"

    The text comes from a MySQL database running UTF-8

    The Result would have to be:

    "Auspuffanlage "Century" für"

    Has anyone an idea how I can convert the UTF8 text into ASCII or ANSI?

    So I get the desired result ...

    Best Regards

    Bernd

    Friday, July 7, 2017 4:17 PM

All replies

  • Hi Bernd Riemke,

    Thank you for posting here.

    For your problem, I created a demo which convert UTF-8 to ASCII, please take a reference.

                string input = "Auspuffanlage \"Century\" f├╝r";
                var utf8bytes = Encoding.UTF8.GetBytes(input);
               var win1252Bytes = Encoding.Convert(Encoding.UTF8,Encoding.ASCII,utf8bytes);
                foreach (var item in win1252Bytes)
                {
                    Console.Write(item+" ");
         }
    

    The result is the following picture.

    Best Regards,

    Wendy


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Wednesday, July 19, 2017 1:32 AM
    Moderator
  • You can convert the text like the following:

    Encoding utf8 = Encoding.UTF8;
    Encoding ascii = Encoding.ASCII;
    
    string input = "Auspuffanlage \"Century\" f├╝r";
    string output = ascii.GetString(Encoding.Convert(utf8, ascii, utf8.GetBytes(input)));

    But the problem with your requirement is getting the "├╝" converted to "ü".

    That is a custom conversion, which does not have anything to do with the used encoding. You see when using ASCII encoding the non-ASCII characters will be replaced with "?" character, so the "├╝" will result to "??". Also I should point out that "ü" is not an ASCII character so it will also result to "?" character.

    Wednesday, July 19, 2017 8:51 AM
  • In UTF-8, every code point from 0-127 is stored in a single byte. Other code points 128 and above are stored using 2, 3, in fact, up to 6 bytes. So, English text looks exactly the same in UTF-8 as it did in ASCII. But, for code points beyond that 127 border, like the german 'Umlauts' there is because of that limitation no equivalent in ASCII and it will be replaced by the general placeholder. For details, refer to 'The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets'.

    wizend

    Wednesday, July 19, 2017 9:58 AM
  • Hi!

    I have a problem converting a string from UTF-8 to ASCII or ANSI


    Bernd,

    Be aware that ASCII and ANSI is not the same. 

    ASCII is a 7 bits code while ANSI is 8 bits.

    ANSI has therefore more characters, therefore tell what you want.

    Be aware that ANSI is an American Subset once created for MS-Dos (437) and called by Microsoft a misnomer

    ANSI: Acronym for the American National Standards Institute. The term “ANSI” as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community. The source of this comes from the fact that the Windows code page 1252 was originally based on an ANSI draft—which became International Organization for Standardization (ISO) Standard 8859-1. “ANSI applications” are usually a reference to non-Unicode or code page–based applications.

    However, probably not one solution in code helps you. The problem in this kind of cases is mostly that the u umlaut is a character from the West European code set 1252 (German, English, Dutch, Spanish etc).  But the computer which is used is not set to that in the used languages. 

    Also the kind of application can make that this is shown like you see, therefore. 

    Simple give more information than you did now.


    Success
    Cor





    Wednesday, July 19, 2017 10:10 AM
  • If the problem is still active, try this workaround too:

    string input = "Auspuffanlage \"Century\" f├╝r";
    var bytes = Encoding.Convert( Encoding.Unicode, Encoding.GetEncoding(437), Encoding.Unicode.GetBytes( input ) );
    string result = Encoding.UTF8.GetString( bytes );
    

    Maybe it is possible to fix the problem in the parts related to MySql. See also the COLLATE operator.

    Wednesday, July 19, 2017 10:28 AM
  • HI!

    I will test it!

    More to the string:

    This is the stringtext:

    Auspuffanlage "Century" f├╝r

    there is no "\"...

    Best regards

    Bernd

    Thursday, July 20, 2017 7:40 AM
  • This is the stringtext:

    Auspuffanlage "Century" f├╝r

    there is no "\"...

    To write a quote character as a string in C# you need to escape it (see Escape Sequences).

    That is why it is written as:

    string input = "Auspuffanlage \"Century\" f├╝r";

    You can also write it like this:

    string input = @"Auspuffanlage ""Century"" f├╝r";

    Thursday, July 20, 2017 8:32 AM
  • Hi PeiWHoward,

    >>This is the stringtext: Auspuffanlage "Century" f├╝r   there is no "\"...

    If you do not have a “ symbol in your string, you don’t need the \ symbol. Please refer to the following link which explain why it needs to add a \ symbol.

    https://msdn.microsoft.com/en-us/library/h21280bw.aspx

     

    Best Regards,

    Wendy


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Monday, July 24, 2017 6:15 AM
    Moderator