none
Invalid information in : MS-OXMSG - Outlook Message File format (.msg) RRS feed

  • Question

  • Hi,
     
    Regarding document MS-OXMSG Section 2.2.3.1.3 String Stream.

    The bytes stored in this stream seem to have both Unicode and ANSI strings. The documentation claims that all strings are Unicode, this is definately not true as I have many .msg files that have both Unicode and ANSI strings.

    Question:
    How do we determine how to read the bytes for each of these strings. There must be a property somewhere that tells us how the bytes are formatted?

    If not, then how do we determine the String type (Unicode or ANSI), by reading the bytes?

    Russell Mangel
    Las Vegas, NV
    • Moved by Steve Smegner Wednesday, August 27, 2008 7:43 AM Moved per Tom Devey for follow-up in Exchange Forum as needed. (Moved from Office Binary File Formats to Using the Exchange Server Protocols)
    Saturday, July 26, 2008 3:57 AM

Answers

  • Russell,

    After looking this I think there is a bug in some add-in that is creating Named properties wrong through MAPI.  You are supposed to create named properties by passing in the wide string in the structure MAPINAMEID but if someone is passing ANSI strings in these the code would treat them as Unicode using both bytes and save them.  You will notice that all of the “ANSI” string still end with a double NULL byte 00 00. 

    I think there is an addin or something creating these PR_RIM_* properties and maybe doing it wrong but this is just a guess looking at the stream data.


    Developer Consultant
    Friday, August 22, 2008 10:51 PM
    Moderator

All replies

  •   Thanks Russell for your inquire!

    We'll review the information and get back to you with our findings.

    Regards,
    SEBASTIAN CANEVARI - MSFT SEE Protocol Documentation Team
    Saturday, July 26, 2008 3:22 PM
  • Russell,

    Properties that are of type PtypString must follow the rules as spelled out below in regards to the Stream.  There Properties that are of type PtypString8 should be ANSI.

    In reviewing the section 2.2.3.1.3 you are right that this conflicts with the section I mention below.  I have filed a bug against the documentation to clarify this in this section.

    If you do find a string property type mismatch please let me know?

    <snippet source=""[MS-OXMSG]">
    2.1.2 Variable Length Properties
    ...

    The following is an exhaustive list of variable length property types:

    • PtypString
    • PtypBinary
    • PtypString8
    • PtypGuid

    ...

    If the PidTagStoreSupportMask [MS-OXPROPS] property is present and has the STORE_UNICODE_OK (bitmask 0x00040000) flag set, all string properties in the .MSG file MUST be present in Unicode format. If the PidTagStoreSupportMask [MS-OXPROPS] is not available in the property stream or if the STORE_UNICODE_OK (bitmask 0x00040000) flag is not set, the .MSG file MUST be considered as non-Unicode and all string properties in the file MUST be in non-Unicode format. All string properties for a Message object MUST be either Unicode or non-Unicode.

    The .MSG file format specification does not allow the presence of both simultaneously.
    </snippet>


    Developer Consultant
    Tuesday, August 5, 2008 8:39 PM
    Moderator
  • I have 4 million Outlook .msg files on disk generated using Outlook 2000/2002/2007. All of them are stored as ANSI. The PidTagStoreSupportMask is not set. All string Properties appear to be in ANSI. However, the Storage "nameid_version1.0" contains a String Stream "__substg1.0_00040102" and it has two types of Strings. ANSI and Unicode. I have copied the bytes from one .msg file (StreamStream) see for yourself below.

    The only way I can see to read this thing is to know the Offset (which is no problem) *AND* the PtypString or PtypString8 of the Named Property that you are looking for. The section in [MS-OXPROPS] String Stream does not inform. 

    Example:
    From Section 3.2.1 Property ID to Property Name:
    You have the value: 0x8005 now you must find the String Name for that named property. Once you have the EntryStream it will contain the String offset in StringStream let's say it's 0x00. Great now how do I read the bytes? They are in UniCode. But, now lets say you asked for 0x8006 and it's offset points to one of the Ansi Strings. How do I know that those bytes are in Ansi. It seems (but I have not tested) that lower case strings are in Unicode, and UPPERCASE strings are ANSI. Again, I have millions of .msg files that are this way.

    If you need some .msg files for testing, I can supply any you need.

    Here are bytes from StringStream storage "__substg1.0_00040102".

    0000 20 00 00 00 78 00 2D 00 6F 00 72 00 69 00 67 00  ...x.-.o.r.i.g.
    0010 69 00 6E 00 61 00 74 00 69 00 6E 00 67 00 2D 00 i.n.a.t.i.n.g.-.
    0020 69 00 70 00 14 00 00 00 50 52 5F 52 49 4D 5F 4D i.p.....PR_RIM_M
    0030 53 47 5F 46 4F 4C 44 45 52 5F 49 44 12 00 00 00 SG_FOLDER_ID....
    0040 50 52 5F 52 49 4D 5F 4D 53 47 5F 53 54 41 54 55 PR_RIM_MSG_STATU
    0050 53 00 00 00 12 00 00 00 50 52 5F 52 49 4D 5F 4D S.......PR_RIM_M
    0060 53 47 5F 52 45 46 5F 49 44 00 00 00 18 00 00 00 SG_REF_ID.......
    0070 50 52 5F 52 49 4D 5F 4D 53 47 5F 4F 4E 5F 44 45 PR_RIM_MSG_ON_DE
    0080 56 49 43 45 5F 33 5F 36 14 00 00 00 50 52 5F 52 VICE_3_6....PR_R
    0090 49 4D 5F 50 41 47 45 52 5F 54 58 5F 46 4C 41 47 IM_PAGER_TX_FLAG
    00A0 2A 00 00 00 78 00 2D 00 6F 00 72 00 69 00 67 00 *...x.-.o.r.i.g.
    00B0 69 00 6E 00 61 00 6C 00 61 00 72 00 72 00 69 00 i.n.a.l.a.r.r.i.
    00C0 76 00 61 00 6C 00 74 00 69 00 6D 00 65 00 00 00 v.a.l.t.i.m.e...
    00D0 18 00 00 00 50 52 5F 52 49 4D 5F 44 45 4C 45 54 ....PR_RIM_DELET
    00E0 45 44 5F 42 59 5F 44 45 56 49 43 45 14 00 00 00 ED_BY_DEVICE....
    00F0 78 00 2D 00 61 00 69 00 6D 00 61 00 69 00 6C 00 x.-.a.i.m.a.i.l.
    0100 65 00 72 00 10 00 00 00 78 00 2D 00 61 00 69 00 e.r.....x.-.a.i.
    0110 6D 00 69 00 6D 00 65 00 16 00 00 00 78 00 2D 00 m.i.m.e.....x.-.
    0120 61 00 69 00 6D 00 63 00 2D 00 61 00 75 00 74 00 a.i.m.c.-.a.u.t.
    0130 68 00 00 00 1E 00 00 00 78 00 2D 00 61 00 69 00 h.......x.-.a.i.
    0140 6D 00 63 00 2D 00 6D 00 61 00 69 00 6C 00 66 00 m.c.-.m.a.i.l.f.
    0150 72 00 6F 00 6D 00 00 00 16 00 00 00 72 00 65 00 r.o.m.......r.e.
    0160 74 00 75 00 72 00 6E 00 2D 00 70 00 61 00 74 00 t.u.r.n.-.p.a.t.
    0170 68 00 00 00 10 00 00 00 4B 00 65 00 79 00 77 00 h.......K.e.y.w.
    0180 6F 00 72 00 64 00 73 00                         o.r.d.s.





    Friday, August 8, 2008 9:29 AM
  • Hi,

    Just wondering what the status is of my question?

    I still don't know how to decode the StringStream structure because it contains both Unicode and Ansi Strings.

    I posted a reply of some sample data that shows my problem.

    Thanks.

    Wednesday, August 20, 2008 5:10 AM
  • Russell,

    Sorry I missed your follow-up question.  I got a nice nudge from Steve Smegner today :).  Let me dig into this.
    Developer Consultant
    Wednesday, August 20, 2008 4:33 PM
    Moderator
  • Russell,

    After looking this I think there is a bug in some add-in that is creating Named properties wrong through MAPI.  You are supposed to create named properties by passing in the wide string in the structure MAPINAMEID but if someone is passing ANSI strings in these the code would treat them as Unicode using both bytes and save them.  You will notice that all of the “ANSI” string still end with a double NULL byte 00 00. 

    I think there is an addin or something creating these PR_RIM_* properties and maybe doing it wrong but this is just a guess looking at the stream data.


    Developer Consultant
    Friday, August 22, 2008 10:51 PM
    Moderator