none
Character encoding issue from C++ to C# RRS feed

  • Question

  • Hi,

    We are currently facing an issue with passing of a string from a COM C++ dll to a C# dll.
    The issue is in the C++ code, we pass across a string (BSTR) which has the character '„' in the string. (Notice that is not a regular double quote)

    We need to use the integer value of this character for some purpose.
    In C++, this character's integer value is -124.

    But, when it is passed to C#, the integer value of thic character becomes 8222.
    Can you please explain how we can retrieve the original value (-124).

    We're guessing this could be an encoding issue, but even Encoding.ASCII.GetBytes(str) does not give us the correct value we're looking for.

    In case you're wondering why we're playing around with the integer value of the character, we need it for some binary operations at the hardware level.

    Any help would be really helpful!!

    Thank You!

    Friday, June 17, 2011 6:59 AM

Answers

  • use encoding 1250, 1252 or ... (which you use) to get your data.

                byte[] data = Encoding.GetEncoding(1252).GetBytes("„");
                Console.WriteLine("value: {0}", (sbyte)data[0]);


    • Edited by boothwine Friday, June 17, 2011 12:01 PM can more codepages...
    • Marked as answer by Santosh Tatpati Monday, June 20, 2011 6:49 AM
    Friday, June 17, 2011 12:00 PM

All replies

  • Have a look at :- System.Text.Encoding.Convert() this may help.

    I think with BSTR you are dealing with Wide strings, it looks like your default encoding is unicode, maybe try using MarshalAs(UnmanagedType.BStr) in your c# p/invoke declaration.

    You may need to use Convert.FromBase64String()  in c# to get back.

    It may help if you can give some coding examples of the string in c++ and how you are passing it to c#.


    Friday, June 17, 2011 7:57 AM
  • Hi Marchant,

    Thanks for the reply.
    I'm not sure it's a marshalling issue.
    Reason I say that, is the other characters (which are normal alphabets) are fine. It's only this one character '„'.

    As a coding example, I tried out the following two lines in a Cpp App project and a C# App project:

    static void main()
    {
        char chr = '„';
        int intChr = (int)chr;
    }

    I again get the same result:

    That is,

    in the Cpp program the value of intChr is -124,

    whereas in the C# program the value of intChr is 8222.

    I'm guessing I should da a 'lossy conversion' in C# to get the value -124, but not sure how :(

    Friday, June 17, 2011 10:20 AM
  • use encoding 1250, 1252 or ... (which you use) to get your data.

                byte[] data = Encoding.GetEncoding(1252).GetBytes("„");
                Console.WriteLine("value: {0}", (sbyte)data[0]);


    • Edited by boothwine Friday, June 17, 2011 12:01 PM can more codepages...
    • Marked as answer by Santosh Tatpati Monday, June 20, 2011 6:49 AM
    Friday, June 17, 2011 12:00 PM
  • In my Windows 7 32-bit Enterprise machine, using Character Map and Arial Unicode MS as font, I see that character 8222 (or 0x201E) is exactly the character you show in your post.

    Besides, AFAIK, characters are unsigned, so the value of -124 doesn't really ring any bells for me.


    MCP
    Friday, June 17, 2011 1:51 PM
  • In my Windows 7 32-bit Enterprise machine, using Character Map and Arial Unicode MS as font, I see that character 8222 (or 0x201E) is exactly the character you show in your post.

    Besides, AFAIK, characters are unsigned, so the value of -124 doesn't really ring any bells for me.


    MCP

    signed byte of -124 is same as unsigned 132 (0x84)

    Character 132 in several code pages is the character OP is writing about. So he is using normal char which holds a ansi character from his computer current code page. This get translate to a BSTR and will end-up with the unicode character 0x201E.

    So the OP needs only to translate the unicode string back to the correct code page. That is all. (I hope)

    Friday, June 17, 2011 2:38 PM
  • Ah, ok.  That makes sense.  I guess the OP should have transmitted the data as a byte SAFEARRAY instead of a BSTR to avoid this pain.
    MCP
    Friday, June 17, 2011 3:01 PM
  • Brilliant!! Thank you Marchant, Boothwine and WebJose!!! :)

    Just found out that we were using codepage 1252. That's why the conversion error.
    Monday, June 20, 2011 6:48 AM