none
Outlook AppointmentItem's RtfBody byte content has incorrect values for non-Latin language RRS feed

  • Question

  • Issue with Microsoft.Office.Interop.Outlook.AppointmentItem

    AppointmentItem’s RtfBody property is returning some incorrect byte values causing the text entered by the user to be altered. This is even before using any text encoding or RTF parsing.

    The RTF sequence for the text entered in this example is: \'c7\'e1\'d3\'e1\'c7\'e3} but that is not what it should be. The following figure illustrates how some bytes don't correspond to the expected values.

    The letter LAM (E4) becomes FEH (E1)



    The letter MEEM (E5) becomes KAF (E3)



    The consequence is: the correct text is transformed into the incorrect one (see below). All I'm doing here is reading AppointmentItem’s RtfBody then writing again while checking the byte values provided by Outlook.



    The encoding used based on the Outlook’s WordEditor properties is msoEncodingWestern (see below). If I use what I believe is the equivalent in Text.Encoding (i.e. WesternEncoding), I get an incorrect result (similar to what the byte values are but not corresponding to what’s been written on the Outlook window).


    Note: If I use the high-level method SaveAs2 in Word.Document, I notice the RTF still contains the incorrect characters but when that RTF file is opened in Word (by double-clicking on it in Windows Explorer), it displays correctly which makes me think Office SDK manages non-Latin languages differently. I am not sure however which code table is being used for that end or what parameters when it comes to MsoWesternEncoding.

    Examples of code tables for Arabic that have been used in this experiment can be found in the following links:

    http://memory.loc.gov/diglib/codetables/33.html

    http://jrgraphix.net/r/Unicode/0600-06FF

    http://mindprod.com/jgloss/encoding/x-ibm1046.html

    Could you please assist and explain what's happening?

    Regards,

    Hicham





    Monday, June 5, 2017 12:59 PM

Answers

  • Hi everyone,

    UPDATE: I've managed to get it to work. Another problem was the RTF parser I was using (even though I was not modifying the text). I had to make sure the RTF parser was aware of the encoding used by AppointmentItem.

    But that is a problem I've managed to fix. The other problem, for which I am going to post a new separate question, is the fact that AppointmentItem.InternetcodePage does not return the right encoding that's being used in the appointment. I had to hard-code mine (for testing purposes) as Windows-1256 to get the text to show correctly.

    The new question can be found here: https://social.msdn.microsoft.com/Forums/office/en-US/9b06de36-15cc-41f1-9ee4-d5a26a234c20/outlook-appointmentitems-internetcodepage-does-not-return-an-accurate-value-if-arabic-or-russian-is?forum=outlookdev

    Thanks all for your support.

    Hicham


    • Edited by Hisham7G Friday, July 7, 2017 4:14 PM
    • Marked as answer by Hisham7G Friday, July 14, 2017 1:44 PM
    Friday, July 7, 2017 4:09 PM

All replies

  • Hi Hisham7G,

    What's salamByteValue? I tried to get an AppointmentItems RtfBody property in Outlook 2016 then I got a byte array with about 40000 bytes. So how do you get the salamBytValue and which version of Outlook you are using. Could you tell me if you have added any additional language package? I will try best to reproduce your issue with same environment, more details will be helpful.

    Best Regards,


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Wednesday, June 7, 2017 2:35 AM
    Moderator
  • Hi Celeste Li,

    Thanks for your reply. the byte array will indeed have thousands of elements (mine was over 10 thousands) because it is the equivalent of RTF content (which is massive) but in bytes. And regarding the other question, I don't have any external or additional language package in .Net. I am merely using Arabic as a second language - OS level - (alongside English) and testing with an Arabic word.

    I have Outlook 2013. To reproduce, if you just create a new appointment or meeting (not email), type in السلام (copy/paste it) with nothing more (no signature or greetings) then get the RtfBody, you'll get thousands of bytes most of them are RTF formatting. In order to locate the area corresponding to السلام using a loop combined with conditions please look for the sequence 92 39 99 55 92 which corresponds to \'c7\ (which should be the same on your end). Then if you compare the following values with the ones I drew in red in the first figure (see above) to see if you are getting the equivalent (in bytes) of  \'c7\'e1\'d3\'e1\'c7\'e3} or not. If you share the byte values from the first \'c7 and up to \'e3} and we'll take it from there.

    We'll have two possibilities:

    - either you'll get the same byte values (same as mine) after which we need to understand why some letters are showing the incorrect byte values (see the first figure on top in which I say "Should be X not Y" in red)

    - or you'll get different results, probably the correct values after which we need to understand why I'm getting incorrect values for some letters.

    Let me know if you need any clarifications. 






    • Edited by Hisham7G Wednesday, June 7, 2017 9:32 AM
    Wednesday, June 7, 2017 9:14 AM
  • Hi Hisham7G,

    According toRich Text Format, there are several different revisions of RTF specification and RTF specifications were changed and published with major Microsoft Word and Office versions. In the Character encoding chapter, we could know code page escapes could help encode character with different code page. After checking the table in Windows-1256 code page, I think AppointmentItem.RTFBody returns the rtf bytes array with RTF specifications of Windows-1256 code page. Such as م is indeed E3 and ل is indeed E1. You could take reference to table to set rtf body bytes.

    Best Regards,

    Terry

    Monday, June 12, 2017 2:54 AM
  • Hi Terry,

    Sorry for the late reply (been away) and thanks for your support.

    What you say actually makes sense. But that leads me to another question: in C#, I can't find any way to force Windows-1256 when writing AppointmentItem.RTFBody back. The following example illustrates the issue. If I use method 1 or 2, I still get malformed text. Do you have an idea how to handle that please?

    METHOD 1: using default encoding

    // reading AppointmentItem's RTF body

    string existingRtfBody = System.Text.Encoding.Default.GetString(appointment.RTFBody);

    // writing to AppointmentItem's RTF body gives a malformed text

    appointment.RTFBody = System.Text.Encoding.Default.GetBytes(existingRtfBody);

    METHOD 2: using code page / encoder

    Following the same logic in Method 1 (reading from RTFBody then writing to it): getting InternetCodepage from AppointmentItem or fixing it as 1256 (for testing purpose):

                int codepage = 1256; // appointment.InternetCodepage;
                Encoding encoding = Encoding.GetEncoding(codepage);

    or getting MsoEncoding from Microsoft.Office.Interop.Word.Document's TextEncoding then using that to process the string gives the same malformed result. 

    Looking forward to your reply.

    Hicham




    • Edited by Hisham7G Wednesday, July 5, 2017 10:07 AM
    Tuesday, July 4, 2017 11:12 AM
  • Do not rely on particular encoding, either explicit in the RFF stream or implicit in Windows. Store Unicode characters in RTF in the form of "\u1234 ?"

    Dmitry Streblechenko (MVP)
    http://www.dimastr.com/redemption
    Redemption - what the Outlook
    Object Model should have been
    Version 5.5 is now available!

    Tuesday, July 4, 2017 7:32 PM
  • Hi Dmitry,

    Can you elaborate please? I have actually tried with Unicode: converting AppointmentItem.RtfBody (bytes) into a string using Unicode gives a wrong text representation. If you could include a code snippet that'd be great.

    Thanks,

    Hicham


    • Edited by Hisham7G Wednesday, July 5, 2017 10:12 AM
    Wednesday, July 5, 2017 10:10 AM
  • No, what I mean is using the actual Unicode characters (e.g. "\u1234 ?") instead of relying on Outlook to render single byte characters (e.g. "\'c7\'e1") to the Unicode data using the appropriate codepage which it may or may not pick correctly.

    Dmitry Streblechenko (MVP)
    http://www.dimastr.com/redemption
    Redemption - what the Outlook
    Object Model should have been
    Version 5.5 is now available!

    Wednesday, July 5, 2017 1:37 PM
  • Thanks for your reply Dmitry.

    Because this project is an Outlook add-in, we have to read the body from AppointmentItem.RtfBody (which is RTF). So even before I use any encoding, Outlook will provide the characters \'c7\'e1 for example (as bytes) instead of Unicode values. Even if I convert the bytes using Encoding.Convert(srcEncoding, dstEncoding, bytes), I am not sure what the srcEncoding would be. If we get the srcEncoding from Outlook that'd mean relying on Outlook again. Please let me know if what I'm saying makes sense.

    Wednesday, July 5, 2017 1:52 PM
  • So how do you modify the RTF? Does it work if you set it back unmodified?

    Dmitry Streblechenko (MVP)
    http://www.dimastr.com/redemption
    Redemption - what the Outlook
    Object Model should have been
    Version 5.5 is now available!

    Wednesday, July 5, 2017 3:35 PM
  • No, it does not work even if I set it back unmodified. That is my problem. I am not able to understand how Outlook handles it. Take the following two test cases:

    1. I read the appointment's RTF then write it (either using UTF-8 or Default.. or using the code page in InternetCodePage or even if I fix it, temporarily, as 1256 as Terry suggested above), then write immediately without modifying it. The text changes.

    2. For testing purpose, if I save the content of Microsoft.Office.Interop.Word.Document into an RTF file using the method SaveAs2 and open the file by double-clicking on it (opens in Word), the text is correct! So Outlook does something (which I am not aware of) in order to keep bytes/text consistent while encoding/decoding.


    • Edited by Hisham7G Wednesday, July 5, 2017 3:48 PM
    Wednesday, July 5, 2017 3:47 PM
  • What code page? Do you convert the data to a string? RtfBody is an array, you should not touch it using any string functions.

    Dmitry Streblechenko (MVP)
    http://www.dimastr.com/redemption
    Redemption - what the Outlook
    Object Model should have been
    Version 5.5 is now available!

    Wednesday, July 5, 2017 4:03 PM
  • I'll answer your two questions separately:

    1. What code page? AppointmentItem.InternetCodePage that tells us which code page is used for the appointment in question. So even if I retrieve that, get its corresponding encoding (Encoding encoding = Encoding.GetEncoding(codepage)) and use the encoding while converting RTFBody to text then converting back, I can see the text changes.

    2. Do you convert the data to a string? Yes, I do. Eventually I will be inserting some auto-generated text using RTF parser so I need to get the string value of the RTF content (from the byte array RTFBody). But for now, I am not doing any of that: all I want is be able to convert the byte array RTFBody to string then convert it back to byte array. FYI, that works perfectly with English (been using UTF-8) but the moment I add Arabic or Russian I start having issues. Can you elaborate here: why am I not supposed to touch RTFBody? The product I am working on is meant to insert auto-generated information for employees; I can't see any other way to achieve that.

    UPDATE: I've just come across your reply on another thread suggesting setting RTFBody. So I'm sure I've misunderstood your 2nd question.

    https://stackoverflow.com/questions/16916690/outlook-appointmentitem-how-do-i-add-some-data-into-the-rtf-mail-body-when-se 

    As I have previously mentioned, you need to use the Meetingtem.RtfBody property. Forget about the Inspector object. – Dmitry Streblechenko Jun 9 '13 at 17:05




    • Edited by Hisham7G Wednesday, July 5, 2017 4:41 PM
    Wednesday, July 5, 2017 4:14 PM
  • Hi everyone,

    UPDATE: I've managed to get it to work. Another problem was the RTF parser I was using (even though I was not modifying the text). I had to make sure the RTF parser was aware of the encoding used by AppointmentItem.

    But that is a problem I've managed to fix. The other problem, for which I am going to post a new separate question, is the fact that AppointmentItem.InternetcodePage does not return the right encoding that's being used in the appointment. I had to hard-code mine (for testing purposes) as Windows-1256 to get the text to show correctly.

    The new question can be found here: https://social.msdn.microsoft.com/Forums/office/en-US/9b06de36-15cc-41f1-9ee4-d5a26a234c20/outlook-appointmentitems-internetcodepage-does-not-return-an-accurate-value-if-arabic-or-russian-is?forum=outlookdev

    Thanks all for your support.

    Hicham


    • Edited by Hisham7G Friday, July 7, 2017 4:14 PM
    • Marked as answer by Hisham7G Friday, July 14, 2017 1:44 PM
    Friday, July 7, 2017 4:09 PM