none
setlocale problem with code page 65001

    Question

  • Hello!

    I hope someone can help me with the following:

    I written a console app that forces the locale to UTF-8 (codepage 65001) and the following code works fine if compiled under VC 7.1 However, the call to setlocale() under VC 2005 fails.

       if(!SetConsoleOutputCP(CP_UTF8)){
            printf("could not set console output code page \n");
            return 1;
        }
        if(!SetConsoleCP(CP_UTF8)){
            printf("could not set console input code page \n");
            return 1;
        }

        printf("Console output code page is %d\n",GetConsoleOutputCP());
        printf("Console input code page is %d\n",GetConsoleCP());

        char* ploc=setlocale(LC_ALL,".65001");

    I've read something about 65001 being a pseudo codepage, but I'm not really sure what was meant.

    Could someone explain the change from VC 7.1 to 2005 and how to work around it?
    Monday, November 14, 2005 1:07 AM

Answers

  • That's curious; setlocale() was changed to explicitly reject CP_UTF7 and CP_UTF8, but I don't know why that change was made.  You can see the code yourself in the CRT sources; it's in a function called __get_qualified_locale() in getqloc.c, which is called by setlocale().

    Monday, November 14, 2005 9:36 PM
  • There was a bug filed after Beta 1 and it happened to bring attention to this area, http://lab.msdn.microsoft.com/ProductFeedback/viewFeedback.aspx?feedbackid=b96a7e1b-fbad-486d-9e27-d600884d0c16
    After some discussion it was decided to not support these code pages because both CP_UTF8 and CP_UTF7 are pseudo-codepages with no corresponding NLS files. Have you been using these code pages?

    Nikola
    VC++
    Monday, November 14, 2005 10:35 PM
  • CP_UTF8 has not been supported even back in VS2003. setlocale() would silently take it, but actually CRT APIs would not work correctly with this locale. After some review, it was decided to disable setting this page using setlocale() because this would let user know that this code page should not be used. As of workaround, your idea of abstract class seems workable to me and I cannot think of anything simpler. I will ask folks around, if I get simpler workaround I will reply with it.

    Nikola
    VC++
    Tuesday, November 15, 2005 7:19 PM

All replies

  • That's curious; setlocale() was changed to explicitly reject CP_UTF7 and CP_UTF8, but I don't know why that change was made.  You can see the code yourself in the CRT sources; it's in a function called __get_qualified_locale() in getqloc.c, which is called by setlocale().

    Monday, November 14, 2005 9:36 PM
  • There was a bug filed after Beta 1 and it happened to bring attention to this area, http://lab.msdn.microsoft.com/ProductFeedback/viewFeedback.aspx?feedbackid=b96a7e1b-fbad-486d-9e27-d600884d0c16
    After some discussion it was decided to not support these code pages because both CP_UTF8 and CP_UTF7 are pseudo-codepages with no corresponding NLS files. Have you been using these code pages?

    Nikola
    VC++
    Monday, November 14, 2005 10:35 PM
  • Thanks for the reply!

    The project I've been working is for windows console/linux. I originally went with unicode, but then realized under linux wchar_t is 32 bits and decided that was unnecessary overhead. Code page 65001 ended being a great solution to fully support Hungarian text.

     I'm trying to minimize the string conversions I have to do, and by using UTF-8, this makes life very simple since my data source is UTF-8, thus eliminating (almost) the string conversions.

    It seems my only option at this point is to build a class to abstract the string handling between the two platforms (unicode on win32, utf-8 on linux), but of course that will require a significat rewrite, different code base to maintain, etc. etc.

    To be honest, I'm not that clear how NLS and pseudo code pages work. Could you briefly give me an overview? Is there a way to genrate an NLS file for UTF-8?

    The interesting thing too though, under VC7, the fgetc(w) functions fail with a multibyte length assertion, something about not being == 1 or ==2. Sorry to be vague on this part, if you need more detail on it, let me know. My work around was to switch to code page 852, convert to a wide string via MultiByteToWide, then back to UTF8 via WideToMultiByte.

    Why was support for UTF8 and UTF7 dropped in the first place, or was it never truly support at all?

    Thanks,
    Chris
    Tuesday, November 15, 2005 4:56 PM
  • CP_UTF8 has not been supported even back in VS2003. setlocale() would silently take it, but actually CRT APIs would not work correctly with this locale. After some review, it was decided to disable setting this page using setlocale() because this would let user know that this code page should not be used. As of workaround, your idea of abstract class seems workable to me and I cannot think of anything simpler. I will ask folks around, if I get simpler workaround I will reply with it.

    Nikola
    VC++
    Tuesday, November 15, 2005 7:19 PM
  • Ok, thanks for your time.

    Chris
    Tuesday, November 15, 2005 11:23 PM
  • The bug is still with us ;(

    Sunday, May 11, 2014 7:13 AM