none
Why I cannot convert Multi-Byte to Unicode using the Encoding class after reading a DBF from OLEDB Provider? RRS feed

  • Question

  • Hi, I use a OLEDB Provider to read DBF files. Inside it, it's got foreign languages that uses different code pages such as the "Simplified Chinese" 936.

    newOleDbConnection(@"Provider=Microsoft.Jet.OLEDB.4.0;Data Source="+ projectDir + ";Extended Properties=DBase IV;");

    For some reason, when I try to read the Chinese from the DBF, it can't let me do GetBytes, doing that causes an invalid cast exception. I can however, use GetChars and GetValue.ToString. This is my code:

    string oriString = reader.GetValue(1).ToString();

    local =

    Encoding.GetEncoding(codePages[j]).GetString(Encoding.UTF8.GetBytes(oriString));

    Where codePages[j] is 936 for Chinese. The result in "local" is not the Chinese Characters I'm expecting. I'm wondering, why doesn't the OLEDB provider let me use GetBytes (I believe in the processing of GetChars or GetValue.ToString, it does an invalid decode, giving back rubbish). Is there anything wrong with my Encoding than Decoding code?

    I'm using Window 7 64bit Enterprise. Does that make any difference? The Operating System?


    • Edited by Anakinbo Wednesday, May 8, 2013 1:45 AM
    Wednesday, May 8, 2013 1:32 AM

Answers

All replies

  • Hi Anakinbo,

    Welcome to the MSDN Forum.

    How about try this way:

    local =

    Encoding.GetEncoding(codePages[j]).GetString(Encoding.Unicode.GetBytes(oriString));

    If it doesn't work, would you like to upload a test dbf file to me for testing?

    Thanks.


    Mike Feng
    MSDN Community Support | Feedback to us
    Develop and promote your apps in Windows Store
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Wednesday, May 8, 2013 9:23 AM
    Moderator
  • I've tried that. I doesn't work. I can't provide you with dbf file, it's company stuff.
    Thursday, May 9, 2013 12:30 AM
  • Why are you using UTF8 innstead of Unicode in the line below?

    Encoding.GetEncoding(codePages[j]).GetString(Encoding.UTF8.GetBytes(oriString));

    A string is a combination of one and two bytes characters with a priviate property for each character that indicates if it is one or two byte characters.  When you have Chineses the characters are unicode while the carriage return and line feed are the good old 0x0d and 0x0A which are one byte characters.  When you are using UTF8 with GetBytes the characters are going into the string as one bytes chacracters and you loose all the packing of the two byte characters.

    Also when reading/writing the database you must specifiy that the characters from the database are NVCHAR.


    jdweng

    Thursday, May 9, 2013 6:11 AM
  • I've tried that. I doesn't work. I can't provide you with dbf file, it's company stuff.

    Not the real database, would you like to make a test one?

    And How about this way: http://msdn.microsoft.com/en-us/library/system.data.oledb.oledbdatareader.getstring.aspx  

    Best regards,


    Mike Feng
    MSDN Community Support | Feedback to us
    Develop and promote your apps in Windows Store
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Friday, May 10, 2013 11:05 AM
    Moderator