none
Get Word ML from clipboard RRS feed

  • Question

  • I am intercepting the paste event for a richtextbox, in order to process the contents before pasting. If it contains tables or images etc. I need to do some custom stuff. If the copied selection is from Word 2010 and consists of mixed content (eg. text and table/image) Word places the content on the clipboard in a number of formats. These includes HTML and RTF, but I would rather work with WordML. I've used ClipSpy to check what formats and data is actually put on to the clipboard and the "Embed source" format seems to be the format containing WordML. I was hoping this could be opened as a Package:

    var stream = Clipboard.GetData("Embed Source") as MemoryStream;
    var package = Package.Open(stream);

    Tried with the Open XML SDK as well. It throws an EndOfStreamException and I'm thinking it migth be wrapped in something else. I can write the stream to disk and open it using 7-zip and see that the contents are as expected. So basically two questions:  Is "Embed source" the right DataObject to get the WordML? If it is, how do I deserialize it?

    Tuesday, July 2, 2013 9:11 AM

Answers

  • Thanks for replying Cindy.

    After reading about the Flat format I decided to look at the actual file after writing the stream to disk. I can un-zip it and see that it contains the structure and contents of the package format ([Content_Types].xml, _rels, docProps and word folder). If I open it in Word it tells me the document is corrupt, but can actually recover it and show the copied contents.

    So I did a binary comparison of the file before and after recovery. Word has removed something from the beginning of the file. The two first bytes of a proper docx files should be PK, but for this file I have D0 CF 11 E0 A1 B1 1A E1, which turns out to be the Compound format: http://www.openoffice.org/sc/compdocfileformat.pdf. I guess now I just have to figure out how to get what I want from that...maybe using OpenMCDF.


    • Edited by NiNN Tuesday, July 2, 2013 3:21 PM Updated
    • Marked as answer by NiNN Tuesday, July 2, 2013 4:12 PM
    Tuesday, July 2, 2013 2:54 PM
  • I managed to extract the ZipPackage from the compound file using OpenMCDF, so I got everything working now. It just seems a bit overcomplicated, so I'm not sure it is the best solution... and I still haven't tried copying from earlier versions of Word. Maybe I'll end up using RTF from the clipboard instead...

    Wednesday, July 3, 2013 10:54 AM

All replies

  • Could you show us a small example of the XML you're getting from Embed Source?

    I'm guessing it may be in the Open XML "flat file" format, which means it's not a Zip Package, the way the Open XML SDK is expecting. If that's the case, this blog post could help: http://blogs.msdn.com/b/ericwhite/archive/2008/09/29/transforming-flat-opc-format-to-open-xml-documents.aspx


    Cindy Meister, VSTO/Word MVP, my blog

    Tuesday, July 2, 2013 10:07 AM
    Moderator
  • Thanks for replying Cindy.

    After reading about the Flat format I decided to look at the actual file after writing the stream to disk. I can un-zip it and see that it contains the structure and contents of the package format ([Content_Types].xml, _rels, docProps and word folder). If I open it in Word it tells me the document is corrupt, but can actually recover it and show the copied contents.

    So I did a binary comparison of the file before and after recovery. Word has removed something from the beginning of the file. The two first bytes of a proper docx files should be PK, but for this file I have D0 CF 11 E0 A1 B1 1A E1, which turns out to be the Compound format: http://www.openoffice.org/sc/compdocfileformat.pdf. I guess now I just have to figure out how to get what I want from that...maybe using OpenMCDF.


    • Edited by NiNN Tuesday, July 2, 2013 3:21 PM Updated
    • Marked as answer by NiNN Tuesday, July 2, 2013 4:12 PM
    Tuesday, July 2, 2013 2:54 PM
  • <<Compound format>>

    Interesting - I hadn't encountered that, before, that I can recall. However, the article you reference describes the old BINARY file format (From 1.2 Abstract: This document contains a description of the binary format of Microsoft Compound Document files.) Just a heads-up...


    Cindy Meister, VSTO/Word MVP, my blog

    Tuesday, July 2, 2013 4:20 PM
    Moderator
  • I managed to extract the ZipPackage from the compound file using OpenMCDF, so I got everything working now. It just seems a bit overcomplicated, so I'm not sure it is the best solution... and I still haven't tried copying from earlier versions of Word. Maybe I'll end up using RTF from the clipboard instead...

    Wednesday, July 3, 2013 10:54 AM
  • Thanks for coming back and outlining what worked in order to access the information on the Clipboard.

    Yes, if you have to support pre-2007 versions of Word, RTF would probably make more sense. Isn't that also the native format of the control you're inserting the data into?


    Cindy Meister, VSTO/Word MVP, my blog

    Wednesday, July 3, 2013 12:44 PM
    Moderator