locked
Unicode characters RRS feed

  • Question

  • Hi,

    In text format, the Chinese words are represented by the characters like 'ä»ÉÆrÆW¤¤¥'. I want to know the range of the ASCII values of them. It seems it's not OK to get that value by - Convert.ToInt32("ñ"). Any advice?

    Friday, March 7, 2008 2:03 AM

Answers

  •  HuaMin Chen wrote:

    Good day Eric.

    I'm reading it as a text file ONLY. And I'm processing the line character by character. The point is, I need to have the ASCII values for each CHAR (including those that comprises the whole CHINESE words).



    Basic ASCII only reaches a range of 128 characters, including the \0 character.  You can't represent chinese characters in ASCII.
    Tuesday, March 11, 2008 7:50 AM

All replies

  • Hi,

     

    There are two things:

     

    1. You need the font supporting glyphs. To confirm that, do the following Start > Control Panel > Regional and Language Options > Look around for Install files for East Asian languages.

     

    2. In your application, you would need to use fonts supporting 'Unicode'. Try Tahoma or Arial Unicode and check the results.

     

     

    HTH,
    Suprotim Agarwal

    -----
    http://www.dotnetcurry.com
    http://www.sqlservercurry.com
    -----

    Friday, March 7, 2008 2:56 AM
  • Yes, we can choose 'Unicode' for being able to read Chinese. But this is not the point. Just in Text format, I can see something like ´¼¯à, which are representing 2 Chinese words, in my XP PC (English version).

     

    My question is, I want to have the ASCII values for these characters above.

    Friday, March 7, 2008 3:17 AM
  • The problem is that ASCII doesn't encode Chinese, it has a very limited number of characters it can encode, Chinese breaks this limit easily, hence why Chinese is done in Unicode because of the large number of encoded characters allowed.

    http://en.wikipedia.org/wiki/ASCII

    Now there was Extended Ascii code pages for Chinese characters ( 936 and 950 ) and the concept called Big5 and GBK multi-byte ascii-like encodings. So technically for standard ASCII those characters don't exist, especially not for the way an English XP PC would be configured.

    It you're just interested in the raw values things like BitConverter and Encoding would be of intest to you.

    What exactually is your goal? are you trying to compose the chinese words from their byte parts (displayed in ascii) or are you trying to do something with the byte parts of the exisitng chinese words? Are you trying to open up a text view or text reader of a chinese document on an English withdows?
    Friday, March 7, 2008 8:53 PM
  • Thanks Eric!


    I'm only to get ASCII value for each BYTE part (only one character involved; each Chinese word comprises 2 characters). Encoding is for encoding the whole byte encoding!

     

    Any other advice for what I need?


    Monday, March 10, 2008 3:07 AM
  • Are you reading it from a file? keyboard input? string literal inside your program?

    You may be able to use BinaryReader with a FileStream or even a StringStream maybe to read single bytes of a larger object.

    Also BitConverter should work when you have it in memory to convert the whole variable into an array of its byte parts.
    Monday, March 10, 2008 2:37 PM
  • Good day Eric.

    I'm reading it as a text file ONLY. And I'm processing the line character by character. The point is, I need to have the ASCII values for each CHAR (including those that comprises the whole CHINESE words).

    Tuesday, March 11, 2008 3:01 AM
  •  HuaMin Chen wrote:
    I'm reading it as a text file ONLY. And I'm processing the line character by character.


    Ni hao,

    I'm not really sure what the trouble is, here.  If you just read from the Stream byte by byte, there shouldn't be any troubles.  Of course, the data you read in is going to be subject to encoding (UTF8, UTF16, whatever your file is encoded in).  But you'll get multibyte characters one byte at a time.
    Observe the following:

    Code Snippet

                MemoryStream file = new MemoryStream();
                StreamWriter writer = new StreamWriter(file, new UTF8Encoding(false));
                writer.Write("中國");
                writer.Flush();

                file.Position = 0;

                for(int pos = 0 ; pos < file.Length ; pos++)
                {
                    int character = file.ReadByte();
                     output( string.Format("Character {0} = {1}", pos, character) );
                }


    Tuesday, March 11, 2008 7:48 AM
  •  HuaMin Chen wrote:

    Good day Eric.

    I'm reading it as a text file ONLY. And I'm processing the line character by character. The point is, I need to have the ASCII values for each CHAR (including those that comprises the whole CHINESE words).



    Basic ASCII only reaches a range of 128 characters, including the \0 character.  You can't represent chinese characters in ASCII.
    Tuesday, March 11, 2008 7:50 AM
  • Hi,

     

    Here's some interesting read:

     

    ASCII characters range is 0 - 127 while Unicode characters range is at least 0 - 65535.

    There are some encodings defined which map Unicode sequences to byte sequences: UTF-8 maps Unicode strings to sequences of bytes in the range 0..255, UTF-7 maps Unicode strings to sequences of bytes in the range 0..127. You *could* read the latter as ASCII sequences but this is not correct

     

    http://www.velocityreviews.com/forums/t365332-p-how-to-get-the-ascii-code-of-chinese-characters.html

     

     

    HTH,
    Suprotim Agarwal

    -----
    http://www.dotnetcurry.com
    http://www.sqlservercurry.com
    -----

    Tuesday, March 11, 2008 8:35 AM
  • Thanks to all for some relevant info!

     

    To mak it simple I only wanted to have the corr. value of Unicode characters.
    It should not be OK by 'Convert.ToString(Convert.ToInt32("Ï"))'. Any advice?

     

    I only want to know, if it's possible to have the values of Unicode characters, that falls in the range (128 to 65535), by Convert.ToInt32. Actually I'm to have some encryption for such characters. On the other hand, for the range 0 to 127, it should not be problematic, shouldn't it?

     

    LAST FRIDAY WAS A VERY BAD BLACK FRIDAY AS I COULD NOT POST ANYTHING IN MSDN!

    Friday, March 14, 2008 2:21 AM
  • Is it impossible to do that? I wonder about this.
    Tuesday, March 18, 2008 1:36 AM
  • From the MSDN: Convert.ToInt32 - "This method supports the .NET Framework infrastructure and is not intended to be used directly from your code."

    I personally have no idea what behaviour you'll get by passing a string (which, by all accounts, can NOT be converted into a number) into that function.

    If you want to get the individual bytes of a Unicode-encoded string, write the string onto a stream, such as a MemoryStream, then read bytes back off it.

    Try something like this.

    Code Snippet

    using(MemoryStream buffer = new MemoryStream())
    {
      using(StreamWriter writer = new StreamWriter(buffer, new UnicodeEncoding()))
      {
        writer.Write("Can only type chinese on my other windows box :(");
        writer.Flush();

        buffer.Position = 0;

        for(int i = 0; i < buffer.Length; i += 2)
          Console.WriteLine("Character {0} = [{1},{2}]", (i/2)+1, buffer.ReadByte(), buffer.ReadByte());
      }
    }


    Tuesday, March 18, 2008 10:38 AM
  • Very simple, it is, I expect to have the same character; in this case, it's "Ï", for those Unicode characters in the range 128 ~ 65535.

     

    Convert.ToString(Convert.ToInt32("Ï"))

     

    I don't know why my request is becoming complicated!

     

    One easy example for my case is

     

    16:05:06 SQL> select ascii('Ï') from dual
    16:05:43   2  /

    ASCII('Ï')
    ----------
         50063

    16:05:43 SQL>           
    16:05:43 SQL> select chr(ascii('Ï')) from dual
    16:05:43   2  /

    C
    -
    Ï

     

    I'm able to achieve this within Oracle.

     

    Any ambiguity for my question? Any advice? Is there anything unclear? Is it really so hard for this?

    Wednesday, March 19, 2008 8:01 AM
  •  HuaMin Chen wrote:
    I don't know why my request is becoming complicated!


    Perhaps it's become complicated because you're using a .NET method specifically not intended for use to do something that doesn't make a lot of sense...

    Sorry I couldn't be of more help; however, I think I'm going to wash my hands clean of this. If someone else wants to tackle it, be all means.  Take care.
    Monday, March 24, 2008 9:48 PM
  • Hi,

    I do see what you've said but it seems it's not perfect for one thing, which can easily be done by PL/SQL! REALLY NOT GOOD FOR THIS!

    Tuesday, March 25, 2008 5:24 AM
  •  

    Is there is any way for converting a chinese number using Convert.ToInt32(chinese number).Iam in the middle of program in trouble throwing errors like "Input String was not in correct format".Kindly anyone  send me asap
    Monday, June 16, 2008 1:51 PM