RTFBody encoding RRS feed

  • Question

  • Can someone give me a full explanation of the RTFBody property of an Outlook mailitem?

    It returns a byte array representing the RTF string of the mail body.

    The examples that describe how to get the string from the RTFBody generally suggest using ASCIIEncoding. It follows that UTF8Encoding can also be used as these are the same for non-extended characters - ie those below 127 in the ASCII / UTF-8 character sets.

    This works fine unless you have an extended character in the mail body.

    In this scenario the RTFBody property byte array shows the byte value to be > 127 and you can for example look up to a unicode table to match the number to the character in the mail body.

    If you get the string using an ASCII or UTF-8 encoder, these don't handle extended characters and you get a default substitute character (default is question mark) in the string for each extended character.

    If you instead use a unicode encoder which could handle > 127 chars, the string is nonsense - which is predictable if you remember that the RTFBody appears to be an ASCII encoded string - so using unicode makes no sense.

    It's as if I need to use an ASCII or UTF-8 encoder but also specify a particular code table for the extended character set but I can't see a way to do that with the Encoding class.

    So I'm left wondering how the RTFBody byte array is encoded and how I can decode it into a string to include the extended characters.

    Is it always ASCII encoded or is it somehow dependent on local internationalization settings?

    Wednesday, July 31, 2013 1:13 PM

All replies

  • RTF specifies the code page as one of its tags (e.g. "\ansicpg1252")  - look at the RTF body with OutlookSpy (click IMessage button, select the PR_RTF_COMPRESSED property).

    Also note that in most cases Unicode characters are encoded using RTF rules (e.g. \u1234)

    Dmitry Streblechenko (MVP)
    Redemption - what the Outlook
    Object Model should have been
    Version 5.5 is now available!

    Wednesday, July 31, 2013 4:42 PM
  • Hi Dmitry,

    I've read a few of your forum responses over the last few weeks so appreciate your valuable input. I've checked and I also have code page 1252 for extended character set.

    For a while, knowing that didn't help me but when I looked through the different Encoders available to me from System.Text.Encoding, only one of them had the matching codepage of 1252. It was the default encoder - therefore I'm concluding that the default OS ANSI encoder is used to create the RTFBody byte array.

    I'm now running with:

    byte[] byteArray = mailItem.RTFBody as byte[];
    Encoding encoding = Encoding.Default;
    string RTF = encoding.GetString(byteArray);

    which is working well.

    I did get a bit nervous when reading around the use of Encoding.Default where there is a suggestion that it should be avoided. However, I think that advice applies more where you are encoding on one machine and decoding on another.

    • Edited by pscross Thursday, August 1, 2013 12:30 AM
    Thursday, August 1, 2013 12:30 AM
  • This just happens to be the code page of your current locale... It will be different under a different locale.

    Why exactly do you want to convert binary RTF data to a string? It is not a string, and cannot be represented as a string unless all extended characters are RTF encoded.

    Dmitry Streblechenko (MVP)
    Redemption - what the Outlook
    Object Model should have been
    Version 5.5 is now available!

    Thursday, August 1, 2013 6:15 AM
  • I have a requirement (quite a common one judging by discussions I've seen online) to stamp the body of the mail with extra information - in my case classification information collected by a custom dialog on the email send event.

    I can't restrict the users on format type so I need to update the body for each possible format type that the user may choose: plain text, html and RTF.

    Stamping the body involves finding the top of the body text, inserting some custom text and saving back to the mail.

    To that end, I need to get the RTF text of the mail to be updated and as far as I can see the RTFBody is the only route in to get that RTF string representation. (I don't have Redemption by the way so if that assists with this task then please let me know.)

    This is why I need to understand exactly what RTFBody is. I'm seeing it as a binary encoding of the RTF string representation of the mail body. If I can understand exactly how it is encoded, I should be able to decode it, update it, encode it and save it back to the mail.

    Indeed I can currently do all this but I need to be confident that I can choose the matching decoder to handle the extended characters.

    I think I see what you mean by your comment "This just happens to be the code page of your current locale". I can imagine that an email from another machine & locale would have a different encoding for RTFBody. Perhaps what I can do here is read the code page from the ASCII decoded RTF string and if I have a decoder that allows me to decode that code page I can continue, otherwise I could perhaps advice the user to change to html format. In most cases for my english speaking customers I bet the code page will be 1252 and I can use RTF fine. There will just be a minority of cases where I can't decode and need to use the HTML fallback.

    Also, I'm happy to consider alternative approaches if there is a fundamental unreliability in this approach. I have considered for example that it may be better to do the updating of the mail using Exchange transport rules instead (but I'd need to do further research!).

    Sorry for the long text and thanks for your advice.

    • Edited by pscross Thursday, August 1, 2013 9:50 AM
    Thursday, August 1, 2013 9:12 AM
  • Again, you should not treat RTF body as a string. It is a binary blob (array of byte) and should be treated as such.

    Dmitry Streblechenko (MVP)
    Redemption - what the Outlook
    Object Model should have been
    Version 5.5 is now available!

    Thursday, August 8, 2013 8:16 PM
  • Yes it's a byte array but the bytes represent an encoding of a string.

    You're advising I think that I should never get and work with that string. Is that precisely because of the difficulties I'm having in decoding it properly? (despite there being examples of doing it on MSDN).

    As alternative approaches I see:

    - using the Word object model of the mailitem body (not a nice solution as you get a horrible flashing screen as it selects and inserts in front of your eyes!)

    - always convert to HTML and never work with RTF

    - using the byte array directly without decoding it to a string (gonna be pretty difficult but I wonder if that's what you're suggesting)

    - using Exchange transport rules instead (architecturally I have to say this would be my preferred approach)

    Do you have a different approach in mind or would you use one of the above?


    Thursday, August 8, 2013 10:11 PM
  • No, the bytes do not represent a string at all. At least not in the .Net sense where each character is 2 bytes.

    Dmitry Streblechenko (MVP)
    Redemption - what the Outlook
    Object Model should have been
    Version 5.5 is now available!

    Monday, August 12, 2013 2:17 AM