locked
P/Invoke how to get a unicode string from C# to C++ RRS feed

  • Question

  • I've been using P/Invoke for a long time now, but there is one thing i never quite managed to do. Sending a unicode string from C# to C++.



    That is given a small test CPP library like this:

    Code Snippet
    #include <iostream>

    using namespace std;

    extern "C"   __declspec(dllexport) void PrintString(const char* string)
    {
        cout << string << endl;
    }

    extern "C"   __declspec(dllexport) void PrintStringW(const wchar_t* string)
    {
        wcout << string << endl;
    }

     

     

    And corresponding C# test program like this:

    Code Snippet

    using System;
    using System.Collections.Generic;
    using System.Text;

    using System.Runtime.InteropServices;

    namespace CSConsole
    {
        class Program
        {

            [DllImport("CPPLibrary", EntryPoint = "PrintString")]
            public static extern void PrintStringIntern(string toPrint);

            [DllImport("CPPLibrary", EntryPoint = "PrintStringW")]
            public static extern void PrintStringInternPtr(IntPtr toPrint);

            [DllImport("CPPLibrary", EntryPoint = "PrintString", CharSet=CharSet.Ansi)]
            public static extern void PrintStringInternA(string toPrint);

            [DllImport("CPPLibrary", EntryPoint = "PrintStringW", CharSet = CharSet.Unicode)]
            public static extern void PrintStringInternW([MarshalAs(UnmanagedType.LPWStr)]string toPrint);

            static void Main(string[] args)
            {            
                // make sure we can see unicode characters
                Console.OutputEncoding = System.Text.Encoding.UTF8;

                // hello world should pass just fine ... and it does
                PrintString("Hello World.");

                // this time i would very much like to see a copyright sign (c),
                PrintString("Hello World. " + '\u00A9' + " (unicode)");

                // and finally how about a e' ...
                PrintString("Hello World. " + '\u00E9' + " (unicode)");
                            
                Console.WriteLine(Environment.NewLine +  "(any key to continue)");
                Console.ReadKey();
            }

            static void PrintString(string toPrint)
            {            
                toPrint = toPrint.Normalize(NormalizationForm.FormKC);
                Console.WriteLine("C#:\t\t" + toPrint);
                Console.Write("C++:\t\t");
                PrintStringIntern(toPrint);

                Console.Write("C++ (ansi):\t");
                PrintStringInternA(toPrint);

                Console.Write("C++ (uni):\t");
                PrintStringInternW(toPrint);

                Console.Write("C++ (ptr):\t");
                IntPtr pointer = Marshal.StringToHGlobalUni(toPrint);
                PrintStringInternPtr(pointer);

                Console.WriteLine();
                
            }
        }
    }

     

     

    I would very much like to see the same unicode string in C# and C++.

    Oh ... the output for the above is actually:

    Code Snippet
    C#:             Hello World.
    C++:            Hello World.
    C++ (ansi):     Hello World.
    C++ (uni):      Hello World.
    C++ (ptr):      Hello World.

    C#:             Hello World. © (unicode)
    C++:            Hello World. � (unicode)
    C++ (ansi):     Hello World. � (unicode)
    C++ (uni):      Hello World. � (unicode)
    C++ (ptr):      Hello World. � (unicode)

    C#:             Hello World. é (unicode)
    C++:            Hello World. � (unicode)
    C++ (ansi):     Hello World. � (unicode)
    C++ (uni):      Hello World. � (unicode)
    C++ (ptr):      Hello World. � (unicode)


    (any key to continue)

     

     

    I hope someone here can help. Thanks

    Frank
    Wednesday, May 28, 2008 5:55 PM

All replies

  • UNICODE_STRING  A Unicode string.    (in Winternl.h) 

     

    typedef struct _UNICODE_STRING {
                   USHORT  Length;
                   USHORT  MaximumLength;
                   PWSTR  Buffer;
                } UNICODE_STRING;
                typedef UNICODE_STRING *PUNICODE_STRING;
                typedef const UNICODE_STRING
                   *PCUNICODE_STRING;

    Wednesday, May 28, 2008 6:56 PM
  •  

    While i could certainly wrap the structure for P/Invoke like this:

     

    Code Snippet

    [StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)]

    struct UNICODE_STRING

    {

    public ushort Length;

    public ushort MaximumLength;

    public string Buffer;

    }

     

     

    and then use it as signature. I'm not yet sure how this would help me. This really should have been the same as using:

     

    Code Snippet

    [DllImport("CPPLibrary", EntryPoint = "PrintStringW", CharSet = CharSet.Unicode)]

    public static extern void PrintStringInternW([MarshalAs(UnmanagedType.LPWStr)]string toPrint);

     

     

    to begin with, right? I mean at the end, I really want to have a wchar, or wchar_t, to use with wcout, wprintf or whatever. So even if I would get a UNICODE_STRING, i would then have to convert it with something like this:

     

    Code Snippet

    PUNICUDE_STRING* pustr = [...];
    wchar_t buf [ pustr->Length/2 + 1];
    wcsncpy(buf, pustr->Buffer, pustr->Length/2); buf[pustr->Length/2 + 1] = L'\0';
    wprintf(L"%s",buf);

     

     

    Wednesday, May 28, 2008 8:19 PM
  • The perhaps this will be of help:

     

    WCHAR    16-bit Unicode character. For more     typedef wchar_t WCHAR;
       information, see Character Sets Used
       By Fonts
    .

     

    The text highlighted in red is a commentary or a pointer where to find more information.

     

    It is all here:

     

    http://msdn2.microsoft.com/en-us/library/aa383751(VS.85).aspx

    • Marked as answer by jack 321 Monday, June 2, 2008 6:45 AM
    • Unmarked as answer by Frank Bergmann Thursday, July 3, 2008 11:17 PM
    Wednesday, May 28, 2008 8:29 PM
  • Indeed, you were absolutely right ... C# and 16-bit unicode is no problem at all. The actual problem i was trying to solve was however the following:

    - I have an UTF-8 encoded file, which i read in using StreamReader, and the Encoding.UTF-8, and then passed on to my C library, unfortunately, there it either arrives as ASCII, or as UTF-16.

    So the question, can anything be done to make it UTF-8?

    thanks again
    Frank
    Thursday, July 3, 2008 11:17 PM
  • The problem is not actually with your code, it is the console :) Here are a few things to try to demonstrate what I mean:

    Add extra print statements

    extern "C"   __declspec(dllexport) void PrintStringW(const wchar_t* string) 
        wcout <string <endl
     
        wcout << "PrintStringW Hello World. " << '\u00A9' << " (unicode)" <endl
        wcout << "PrintStringW Hello World. " << '\u00E9' << " (unicode)" <endl

    You will notice that the output of the extra print statements is the same as the output of string.

    Write to file

    extern "C"   __declspec(dllexport) void PrintStringW(const wchar_t* string) 
        wcout <string <endl
     
        wofstream myfile; 
        myfile.open("unicode.txt"); 
        myfile <string <endl
        myfile.close(); 

    This should look exactly as you would expect with both symbols displayed correctly.

    See if that works for you.
    Friday, July 4, 2008 7:44 AM