none
How can I determine programmatically which item(s) in the ObjectPool show up in Word? RRS feed

  • Question

  • I am writing a utility to extract Office (and non-office) embedded objects from Word docs saved in the older binary format.  While testing this, I encountered a few cases where there are three Excel workbooks and a .lnk file in the ObjectPool of the storage file, but there appears to be only one Excel workbook embedded and displayed as an icon in the file when Word is opened.  Turning on Track Changes/Show Markup in Word does not reveal the other files in the ObjectPool. 

    Can anyone offer any insight into why the other objects are in the ObjectPool but do not show up in Word?  How can I determine programmatically which item(s) in the ObjectPool show up in Word?

    Tuesday, December 3, 2013 3:17 AM

Answers

  • Hi Satkinso,

    Instead of using a sample document, let me just outline the process for finding the OLE objects. I will leave the implementation to you. If you have questions as you implement, please feel free to post them.

    OLE objects are stored in fields in Word documents.  The fields are in the WordDocument stream and can be in different parts of the document within the WordDocument stream.  To find the field lists for the different document parts, search on the members of the FIB.FibRgFcLcb97 for names with “plcfld” (non-case sensitive).  There will be members like  lcbPlcfFldMom, lcbPlcfFldTxbx and lcbPlcfFldHdr, which point to PlcFld structures for the main document, textbox and the header parts, respectively.  There are other parts which contain fields as well and you will need to search those.  Each of these PlcFld’s ([MS-DOC] 2.8.25 – read this) maps CP’s (character positions – 2.2.1) to Fld’s (field definitions – 2.9.88).  Fields are based on [RFC 4234].  Find Fld’s with fldch=0x13 (field begin) and:

               Grffld = 0x3A (EMBED) or
         Grffld = 0x38 (LINK) or
         (and if desired:)
         Grffld = 0x57 (CONTROL) or
         Grffld = 0x58 (HTMLCONTROL)

    Find corresponding FLD with fldch=0x14 (field separator). NOTE: fields can nest.  The corresponding FLD is not always the next one

    Find corresponding FLD with fldch=0x15 (field end).  If the FLD of the field end has grffldEnd.fZombieEmbed set, skip it.

    These define the boundaries of the OLE field.  Find direct formatting of field separator character 0x14. There is an example of direct formatting in 3.4. Essentially, every document has a PlcfBteChpx (FIB.FibRgFcLcb97.fcPlcfBteChpx) and the PlcfBteChpx maps FC’s to the location of a ChpxFkp, which maps FC’s to sets of properties. Note: not CP’s, FC’s which are byte offsets in the WordDocument stream.

    In the properties for the separator character, check for:

                sprmCFSpec (operand of 1)
          sprmCFOle2 (operand of 1)
          sprmCPicLocation (operand non-zero)

    All of these defined in section 2.6.1.  Extract operand of sprmCPicLocation, convert to string (in base 10) (example “1234567”), pre-pend with an underscore (example “_1234567”).  Find a storage with that name under the ObjectPool storage

    (defined in section 2.1.4) and there’s the OLE object’s data.

    To find the presentation (i.e. metafile or icon or URL) for the object, go back to the Plcfld, find the corresponding FLD with fldch=0x15.  Again, note that it’s not necessarily the FLD immediately following the one for the separator. All text between the CP of the separator and the CP of the end character is the field result.  For EMBED/CONTROL/HTMLCONTROL fields, this is an inline picture character.  For LINK fields, it could be text or a picture.

    By using the PlcFld pointers from the FIB, you will be finding only the OLE objects that are actually live in the document. 

    Thanks, Vilmos

    Saturday, December 21, 2013 10:02 AM

All replies

  • Hello Satkinso:

    Thank you for contacting Microsoft Support. A support engineer will be in touch to assist further.

    Thanks.


    Tarun Chopra | Escalation Engineer | Open Specifications Support Team

    Tuesday, December 3, 2013 4:09 AM
  • Hi Satkinso,

    I am the engineer who will be working with you on this issue. In order to better understand your question, please send the doc file as attachment to ‘dochelp (at) microsoft (dot) com’ and in the e-mail indicate that it is for me. Be sure that the file does not contain any confidential information.

    Regards,
    Vilmos Foltenyi - MSFT

    Tuesday, December 3, 2013 6:15 PM
  • Hi,

    Thanks for your quick response.  I am trying to get permission from the customer to send one of their doc files that has this issue.  I will get back to you soon.

    Regards,

    Satkinso

    Wednesday, December 4, 2013 5:17 AM
  • Hi Satkinso,

    If you have problem getting your customer’s permission for sending the Word document, could you create a similar test document we can examine?

    Thanks, Vilmos

    Wednesday, December 11, 2013 7:16 PM
  • Hi Satkinso,

    Instead of using a sample document, let me just outline the process for finding the OLE objects. I will leave the implementation to you. If you have questions as you implement, please feel free to post them.

    OLE objects are stored in fields in Word documents.  The fields are in the WordDocument stream and can be in different parts of the document within the WordDocument stream.  To find the field lists for the different document parts, search on the members of the FIB.FibRgFcLcb97 for names with “plcfld” (non-case sensitive).  There will be members like  lcbPlcfFldMom, lcbPlcfFldTxbx and lcbPlcfFldHdr, which point to PlcFld structures for the main document, textbox and the header parts, respectively.  There are other parts which contain fields as well and you will need to search those.  Each of these PlcFld’s ([MS-DOC] 2.8.25 – read this) maps CP’s (character positions – 2.2.1) to Fld’s (field definitions – 2.9.88).  Fields are based on [RFC 4234].  Find Fld’s with fldch=0x13 (field begin) and:

               Grffld = 0x3A (EMBED) or
         Grffld = 0x38 (LINK) or
         (and if desired:)
         Grffld = 0x57 (CONTROL) or
         Grffld = 0x58 (HTMLCONTROL)

    Find corresponding FLD with fldch=0x14 (field separator). NOTE: fields can nest.  The corresponding FLD is not always the next one

    Find corresponding FLD with fldch=0x15 (field end).  If the FLD of the field end has grffldEnd.fZombieEmbed set, skip it.

    These define the boundaries of the OLE field.  Find direct formatting of field separator character 0x14. There is an example of direct formatting in 3.4. Essentially, every document has a PlcfBteChpx (FIB.FibRgFcLcb97.fcPlcfBteChpx) and the PlcfBteChpx maps FC’s to the location of a ChpxFkp, which maps FC’s to sets of properties. Note: not CP’s, FC’s which are byte offsets in the WordDocument stream.

    In the properties for the separator character, check for:

                sprmCFSpec (operand of 1)
          sprmCFOle2 (operand of 1)
          sprmCPicLocation (operand non-zero)

    All of these defined in section 2.6.1.  Extract operand of sprmCPicLocation, convert to string (in base 10) (example “1234567”), pre-pend with an underscore (example “_1234567”).  Find a storage with that name under the ObjectPool storage

    (defined in section 2.1.4) and there’s the OLE object’s data.

    To find the presentation (i.e. metafile or icon or URL) for the object, go back to the Plcfld, find the corresponding FLD with fldch=0x15.  Again, note that it’s not necessarily the FLD immediately following the one for the separator. All text between the CP of the separator and the CP of the end character is the field result.  For EMBED/CONTROL/HTMLCONTROL fields, this is an inline picture character.  For LINK fields, it could be text or a picture.

    By using the PlcFld pointers from the FIB, you will be finding only the OLE objects that are actually live in the document. 

    Thanks, Vilmos

    Saturday, December 21, 2013 10:02 AM
  • Hi Satkinso,

    Because there is no response to this issue since my last posting on Saturday, December 21, I assume my explanation was adequate, your problem is solved, you no longer require my assistance and I’ll mark my previous post as answer.

    Thanks, Vilmos

    Thursday, December 26, 2013 9:51 PM