none
UniCode character storage in string array issue RRS feed

  • Question

  • I have written a brief application to practice French language.  It is a flash card application and takes its source text from a plain text file created in Wordpad.

    As we know the French character set has a few different characters in it which are present in many of the MS True-Type fonts (following the UniCode standard.)

    When creating my source file in Wordpad, the various characters are faithfully rendered when entered from the keyboard or copied from "Character Map" thus I feel confident that the encoding is correct in the source file.

    Skipping forward to the problem at hand, the text box controls that finally display my French words render the text with a symbol indication that the letter is not being recognized correctly. 

    I envoked the debugger to view the contents of the string array elements that store French word and the English translation.  As you see in the attached screen-shot "NEXT Button code in debug.jpg" I observe that I have a problem with the special character just as it is about to be displayed in the text box control. (See screen-shot of debug during "...N_Button_Click") Note that the backIndex code is just there to save the indexes of the last three practice lines.

    For those who may be inclined to ask what I am doing to read the file lines into the application I submit a screen shot of that code, heavily commented.  (See attached screen-shot of "...ToolStripMenuItem_Click")

    I would like ideas on what might be going wrong.  I wonder if I have missed including a property or definition to set the UniCode specification prior to the storing of characters in the array or perhaps just simply prior to displaying them.  Are UniCode characters stored differently in a string?

    I found it interesting that the "Français" button text had no trouble rendering the special character "ç."  That means control text must be capable of rendering the characters.

    Friday, December 27, 2013 6:02 PM

Answers

  • ranta,

    Thank you for your thoughts.  I will study the "Encoding.GetEncoding" documentation to learn how to apply this strategy.  I will also open the source file or create a new source file in "WordPad" and save that.  I also have "Word 97" available but do not know if there are advantages to use it as my editor.

    • Marked as answer by 02biker Saturday, December 28, 2013 5:01 PM
    Saturday, December 28, 2013 4:03 PM

All replies

  • Your program reads the file via new StreamReader(@".\\Regular Verbs.txt"). The StreamReader(string) constructor uses the UTF-8 encoding, but WordPad probably encodes the file in code page 1252 instead.  To fix this mismatch, either construct the StreamReader with the correct encoding (e.g. Encoding.GetEncoding(1252)), or save your file in the UTF-8 encoding.  I don't know whether WordPad can use UTF-8, but Notepad sure can.

    By the way, you need not double the backslash in @".\\Regular Verbs.txt". The @ sign already tells the compiler not to treat backslashes as special, so you can just write @".\Regular Verbs.txt". I'm a bit surprised that the file system accepted the double backslash.

    Friday, December 27, 2013 7:07 PM
  • ranta,

    Thank you for your thoughts.  I will study the "Encoding.GetEncoding" documentation to learn how to apply this strategy.  I will also open the source file or create a new source file in "WordPad" and save that.  I also have "Word 97" available but do not know if there are advantages to use it as my editor.

    • Marked as answer by 02biker Saturday, December 28, 2013 5:01 PM
    Saturday, December 28, 2013 4:03 PM
  • For others who may look for an answer here:

    I chose to open my source file in "Notepad" as ranta suggested and found that one can select "UTF-8" encoding listed as a save option.  I also opened it "Word 97" and after keying "UTF-8" into the help utility, found that it is also possible to specifically save out the content currently in the editor in that format.

    This is good information to know when reading and writing text strings from within an application that may be created outside the application.

    Saturday, December 28, 2013 5:08 PM