locked
How to config const string's encoding as utf-8?

    Question

  • Hi,

    I'm writing an app for Windows 8.1, using VS 2013. And meeting some question.

    If I define an string buffer in my code like this:

    char pStr[] = "中国";

    The Chinese string "中国" is encoding as gbk by default. The string buffer is:

    Interestingly, when I debug another third-party project, I find the Chinese string is encoding as utf-8 by default. I define a same string buffer in the code, it will be:

    I want to know how to config the default encoding of const string in VS 2013, for Windows store app and Windows phone store app.

    Different encoding of const string may lead some compatibility issues.

    I have already tried to change the encoding of source cpp file to utf-8, it did not work. 




    • Edited by Lattimore Tuesday, May 27, 2014 3:02 PM
    Tuesday, May 27, 2014 2:59 PM

Answers

  • It means "UTF-8 without the BOM (Byte Order Mark)"
    • Marked as answer by Lattimore Friday, May 30, 2014 8:49 AM
    Thursday, May 29, 2014 7:17 PM

All replies

  • Hi Lattimore,

    I'm not sure what you mean by a "default encoding" here. The string is just a buffer and the data isn't inherently interpreted: the encoding is important at the edges when the string is read in or read out or when it interacts with other systems. There isn't a concept of default encoding within the app itself (or rather: it's all up to the app). Internally, Windows uses UTF-16.

    The important thing is that when reading in data and reading it out the app uses an appropriate encoding. If your app is communicating with data sources using UTF-8 then you'll want to make sure you read in and write out UTF-8. Whether the app converts that to another format or not internally is up to you.

    Tuesday, May 27, 2014 8:14 PM
    Owner
  • The important thing is that when reading in data and reading it out the app uses an appropriate encoding.

    You are right. I think the better way is to define the string as wchat_t* or std::wstring, and call another API to convert the encoding if required. 

    But now, a problem stumped me. The same code will get different result in my project and another third-party project. Both projects are cpp-xmal Windows store app project. The code is:

    char* pStr = "中国";
    CCLabelTTF* pLabel = CCLabelTTF::create(pStr, "Arial", 40);

    The third-party API CCLabelTTF::create() needs a char* as first parameter and requires the char* encoding as utf-8. 

    The code runs well in the third-party project but displays garbled string in my project. When I debug the two projects, I find the encoding of 'pStr' is different(see the pictures in my question), so pass it to the API will get different result.

    I think the better code is like this:

    wchar_t* pwStr =  L"中国";
    //just consider ToUTF8 can convert wchar_t to utf-8 chars
    char* pStr = ToUTF8(pwStr, wcslen(pwStr));
    CCLabelTTF* pLabel = CCLabelTTF::create(pStr, "Arial", 40);

    But now ,I want to know 'Why', not 'How to do'.

     

     
    Wednesday, May 28, 2014 2:01 AM
  • Hi Rob,

    I know why the encoding of the two same const Chinese strings are different. In VS 2013, by click [File]->Advanced save options, I can change the encoding of cpp file. I just notice there are two utf-8 options, one called 'utf-8 with signature', the other called 'utf-8 without signature'. 

    If I choose 'utf-8 without signature', the code {char* pStr = "中国";} will encoding as utf-8. If choose 'utf-8 with signature', it will be gbk.

    Please tell me the difference between the two utf-8 options.

    Wednesday, May 28, 2014 3:20 PM
  • It means "UTF-8 without the BOM (Byte Order Mark)"
    • Marked as answer by Lattimore Friday, May 30, 2014 8:49 AM
    Thursday, May 29, 2014 7:17 PM