locked
RichEditBox handles unicode input from RTF encoded string strangely in WinRT

    Question

  • I'm currently having an issue with RichEditBox taking in and converting an RTF string that I feed it strangely. 

    As an example, for the following RTF string:
    {\rtf1\ansi\deff0{\fonttbl{\f0 Arial;}{\f1 courier new}}{\colortbl;}{\pard Hello \u38899; sailor!\par}}

    I expected it to be that the \u***; character be left alone when putting this string in the RichEditBox with the following code:

    string richText = @"{\rtf1\ansi\deff0{\fonttbl{\f0 Arial;}{\f1 courier new}}{\colortbl;}{\pard Hello \u38899; sailor!\par}}";
    richBox.Document.SetText(TextSetOptions.FormatRtf, richText);

    However, when I extract the text with 

    richBlock.Document.GetText(TextGetOptions.FormatRtf, out currentText);

    What I get instead is the following RTF string:

    {\rtf1\fbidis\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 Arial;}{\f1\fnil\fcharset134 Meiryo UI;}{\f2\fnil Segoe UI;}}
    {\colortbl ;\red0\green0\blue0;}
    {\*\generator Riched20 6.3.9600}\viewkind4\uc1 
    \pard\ltrpar\fs24 Hello \f1\'d2\'f4 sailor!\par

    \pard\ltrpar\tx720\cf1\fweight200\f2\par
    }

    The RichEditBox displays the unicode character within the box correctly, and I have no problem with the change of the overall formatting that it does for my RTF document. The main issue is it replaces the unicode character of \u38899; with this strange encoding of \'d2\'f4, which I'm unsure how to actually convert back into the unicode character format.

    Is there a way to tell the RichEditBox to encode the unicode characters in a specific way in order to get around this, or is there a way to convert the strange encoding back to \u*; form?

    Tuesday, February 10, 2015 7:43 PM

Answers

  • It is not expected that you'll get the same RTF out of a RTF control that you put in. The original RTF is not saved when it is interpreted, only the results, and then new RTF is generated when it is read out.

    I suspect what is happening here is that the original  \u38899 represents a character which isn't available in Ariel and so falls back to the Meiryo character.

    The \'xx notation denotes a character from the current code page: \'d2\'f4 will be an MBCS character for ansicpg1252 (as declared in the RTF header) which represents the same character as \u38899.

    Wednesday, February 11, 2015 2:23 AM
    Owner