none
How to programmatically determine whether a character is asian character or not using OOXML SDK? RRS feed

  • Question

  • Hi there,

    We are now working on an application that finds east-asian characters in Word documents with OOXML SDK, but here we have some difficulties when judging whether some special characters are asian characters or not.

    When we use the 'Word Count' function in Word, the characters " (quotation mark) is determined as non-asian character when the language of the character is set to English in Word. However, when the language of the character is set to Chinese, it is determined as asian character in the word count report.

    In the document.xml file, when the langauge of a character is set to Chinese in Word, a property w:hint='eastasia' will be add to the run. However, not all characters with the w:hint='eastasia' property are asian character in the word count result.

    Here my question is, how can we determine whether a characeter is asian character or not with OOXML SDK and get a result exactly the same with the word count result in the Word application?

    Thanks in advance.

    Tuesday, August 28, 2012 10:01 AM

Answers

  • Hi upsky_,

    Thanks for posting in the MSDN Forum.

    First, OOXML will store all information in xml file via UTF-8 format. I think you can judge whether the Text is east Asia character via regular expression. And if you use other character in the paragraph, it will start with a new run. So you needn't worry about it. I might mixed many kinds of characters in a run if your document create via program. And you need to divide them via regular expression and all characters which aren't east Asia character will be count as one character if the document count the characters as Asian character.

    I hope it can help you.

    Have a good day,

    Tom


    Tom Xu [MSFT]
    MSDN Community Support | Feedback to us

    Wednesday, August 29, 2012 1:32 AM
    Moderator