locked
C++ API for UTF8 decoding

    Question

  • How do I decode a UTF8 encoded string? I see that ATL has an API to convert from Unicode to UTF8 but I wasn't able to find an API to decode UTF8. I cannot use managed code to do this and am looking for a MS C++ API.

    Thanks in advance.

     

    Thursday, October 13, 2011 3:43 PM

All replies

  • >How do I decode a UTF8 encoded string?

    What do you mean by "decode"?

    Presumably you need to convert it to some other string form - possibly
    Windows Unicode?

    See the MultiByteToWideChar API, or use one of the helpers such as the
    ATL CA2WEX which allows you to specify the code page (CP_UTF8).

    Dave

    Thursday, October 13, 2011 3:59 PM
  • Saying that you want to "decode" UTF-8 string is pretty unclear. UTF-8 is a way to represent Unicode, or if you will it is one of Unicode encodings. Windows itself, however, uses another Unicode encoding called UTF-16. It's most likely that you want to convert between UTF-8 and UTF-16, because your text is coming from an external system (under Unix, UTF-8 is more common, and on the web in general, too). If so, you should try MultiByteToWideChar with CP_UTF8. Note that you should not be trying to convert from UTF-8 to any other multi-byte (MBCS) encoding, because these generally work each with one language (or a small set thereof), whereas Unicode text can contain any number of languages.

    HTH.

    Thursday, October 13, 2011 7:26 PM
  • Yeah. The code I had to interface with specified this but I wasn't sure what they needed and was generally looking for all possible APIs/formats I could decode UTF8 into. Turns out they needed this because they used UTF-16 but thankfully I don't need to do this anymore. Thanks so much for your response and I'll keep this in mind the next time I need to do something like this.
    Thursday, October 13, 2011 9:47 PM
  • Thanks all you people for the suggestions. Basically someone was expecting that the input for their interface be an ASCII string since their code was written in C# and hence used UTF16 strings. I don't have to do UTF8 anymore but I now have to convert any and all special characters in some string (even those that are URL "safe") into their corresponding URL escape sequences. I tried InternetCanonicalizeUrl and UrlEscape but those didn't seem to convert characters like + and /. Is there an API that does something like this? I think someone had some sample code that did this but I would prefer an API for this.

    http://social.msdn.microsoft.com/Forums/en-US/vcgeneral/thread/454a7884-80e1-41c5-8df7-a35c879bf535/

    Does such an API exist?

     

    Friday, October 14, 2011 4:00 AM
  • I tried AtlEscapeUrl but that only converts a subset of characters that are considered "atl unsafe". What I want is an equivalent of the C# API HttpServerUtility.UrlEncode that converts all special characters into their corresponding url escape sequences.
    Friday, October 14, 2011 5:40 AM
  • I don’t think you need to convert ‘+’ and ‘/’ because those appear in the browser’s navigation bar just fine?
     
    -- David
     
     
    "jrrtolkienfan" wrote in message news:af11577a-e192-4673-be55-f76f0587f7d5@communitybridge.codeplex.com...

    I tried InternetCanonicalizeUrl and UrlEscape but those didn't seem to convert characters like + and /. Is there an API that does something like this? I think someone had some sample code that did this but I would prefer an API for this


    Efficiently read and post to forums with newsreaders: http://communitybridge.codeplex.com
    Friday, October 14, 2011 5:38 PM
  • Hi David,

    This is not really for use with a browser but somebody's server that my code to work with requires that I encode the '+' and '/'

    I've not found any Win32 API that does this thus far.

     

    Saturday, October 15, 2011 7:16 AM
  • This is not really for use with a browser but somebody's server that my code to work with requires that I encode the '+' and '/'

    Then that server code does not conform to the standard Internet canonicalization. Presumably there is documentation for that server code and how to use it properly. What does it say?
    Saturday, October 15, 2011 3:02 PM
  • Brian, actually the server code is written by someone I work with but I don't believe they will change the format of the query string they are expecting to see. If this does not conform to the standard, how come UrlEncode will do what I want but there isn't a C++ API for it.
    Sunday, October 16, 2011 11:30 AM
  •  
    says ‘+’ is a Reserved character, and it only needs to be encoded “when not used in their special role inside an URL”.  I was going to suggest trying the various flags to URLEscape, but on the surface they don’t seem to be any that might help.  The .NET method URLEncode has an override that takes an Encoding object with various properties.  Are you using that one?  If so, what properties are you using?
     
    -- David
     
    "jrrtolkienfan" wrote in message news:f7b81bb8-d5aa-450a-971f-0a575190b01a@communitybridge.codeplex.com...
    Brian, actually the server code is written by someone I work with but I don't believe they will change the format of the query string they are expecting to see. If this does not conform to the standard, how come UrlEncode will do what I want but there isn't a C++ API for it.

    Efficiently read and post to forums with newsreaders: http://communitybridge.codeplex.com
    Sunday, October 16, 2011 4:46 PM
  • Hi David,

    I've tried URLEscape with the flags that I thought made sense but yes, they don't seem to serve my purpose. Also this is how I used URLEncode:

    HttpUtility.UrlEncode("https://google.com/blah/a=DbD83Dk+5tOWibYBJZc3UTtOak0+traahm/3Chp1ZTw1=");

    this seems to work as I expect it to.

    Monday, October 17, 2011 2:03 AM
  • Sorry, I don’t know.  :-(  I agree the behavior of Windows API’s should be consistent.  Have you tried InternetCanonicalizeUrl()?
     
    Good luck,
    David

    Efficiently read and post to forums with newsreaders: http://communitybridge.codeplex.com
    Monday, October 17, 2011 2:46 PM

  • Efficiently read and post to forums with newsreaders: http://communitybridge.codeplex.com
    Monday, October 17, 2011 2:52 PM
  • Thanks David. I had already tried passing "DbD83Dk+5tOWibYBJZc3UTtOak0+traahm/3Chp1ZTw1=" alone to both urlescape and internetcanonicalizeurl but didn't get the result I wanted.
    Wednesday, October 19, 2011 6:26 AM
  • I think you need to put that string after “http://” so that the function thinks it’s part of the address part of the URL and not the parameters.  Because if it’s not in the address part, the reserved characters need not be encoded (according to the standard).
     
    -- David
     
     
    "jrrtolkienfan" wrote in message news:c8340e2a-875f-4571-a692-4d09994f3250@communitybridge.codeplex.com...
    Thanks David. I had already tried passing "DbD83Dk+5tOWibYBJZc3UTtOak0+traahm/3Chp1ZTw1=" alone to both urlescape and internetcanonicalizeurl but didn't get the result I wanted.

    Efficiently read and post to forums with newsreaders: http://communitybridge.codeplex.com
    Wednesday, October 19, 2011 11:55 AM