Text to speech with Chinese characters RRS feed

  • Question

  • In reference to this post. 

    I am just starting to wrestle with this problem. The referenced post gives the background. I have three questions.

    First, I want to make sure I understand the output of the defined function. Base on the given example would the input to the text to speech function be "&#597D;Ḅ&#FF01;..."?

    Second, I will have files in UTF-8 format, which is variable length in general. Although for Chinese characters I believe it will be 16 bits encoded into 3 bytes. When reading such a file using C/C++ will all the characters just be converted to a fixed 16 bit character format in memory.

    Third, is there a standard function or library call that will do this conversion. It seem a little inconvenient for Microsoft to require an input format nobody supports.


    Friday, June 14, 2019 4:59 AM

All replies

  • Yes, to the Unicode format. UTF-8 is encoding to translate numbers into binary. Unicode is a character set for translating characters into numbers and UTF-16 is 16-bit encoding, variable-width. They are different things and Microsoft have their own definition, due to a long history. Unicode is now the second most popular format and popular with non-Windows developers. So I presume Unicode ingress format means a wider appeal.

    There are lots of string encoding/decoding methods in most languages, such as system.Web.httputility. Loads of libraries for C++, etc which ye olde search tool will provide.


    Got any nice code? If you invest time in coding an elegant, novel or impressive answer on MSDN forums, why not copy it over to TechNet Wiki, for future generations to benefit from! You'll never get archived again, and you could win weekly awards!

    Have you got what it takes to become this month's TechNet Technical Guru? Join a long list of well known community big hitters, show your knowledge and prowess in your favoured technologies!

    Tuesday, September 3, 2019 10:12 AM