none
Convert WordProcessingML to Docx RRS feed

  • Question

  • Hi,

    We have templates (xslt's) for creating WordProcessingML in Office 2003 XML Format  (single xml file).

     

    Now we need that XML to put into an DOCX format. The WordML is very similar to the contents of the "document.xml" file inside the docx package, but it also has all sections/footers/ etc in one file, while in docx it's separated.

    Is there some API call that we could use to embed our xml into DOCX? This doesn't work:

    WordprocessingDocument doc = WordprocessingDocument.Create(@"e:\asdf.docx"WordprocessingDocumentType.Document, true)
    var mainDoc = doc.AddMainDocumentPart(); mainDoc.AddCustomXmlPart(xmlString); //throws null reference exception
    //example of xmlString (well formed xml, created in word, saved as "Word 2003 XML Document"):
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <?mso-application progid="Word.Document"?> <w:wordDocument xmlns:aml="http://schemas.microsoft.com/aml/2001/core" xmlns:dt="uuid:C2F41010-65B3-11
    .....
    <w:tblGrid><w:gridCol w:w="802"/><w:gridCol w:w="7757"/><w:gridCol w:w="2361"/>
    </w:tblGrid><w:tr wsp:rsidR="001832AF"><w:trPr><w:cantSplit/><w:tblCellSpacing w:w="15" w:type="dxa"/>
    </w:trPr><w:tc><w:tcPr><w:tcW w:w="200" w:type="dxa"/><w:tcBorders><w:top w:val="outset" w:sz="6" wx:bdrwidth="15" w:space="0"
    w:color="000000"/><w:left w:val="outset" w:sz="6" wx:bdrwidth="15" w:space="0" w:color="000000"/><w:bottom w:val="outset"
    w:sz="6" wx:bdrwidth="15" w:space="0" w:color="000000"/><w:right w:val="outset" w:sz="6" wx:bdrwidth="15" w:space="0"
    w:color="000000"/></w:tcBorders></w:tcPr><w:p wsp:rsidR="001832AF" wsp:rsidRDefault="00AE4DEA"><w:pPr><w:jc w:val="right"/>
    </w:pPr><w:r><w:t>
    .........

    Wednesday, September 14, 2011 3:40 PM

Answers

  • Hi VelJkoz,

    You can save the word 2003 file as html and then embed it to the docx file via altChunk, here is the code snippet for you to reference:

     

            using (WordprocessingDocument myDoc =
                WordprocessingDocument.Open("Test2.docx", true))
            {
                string altChunkId = "AltChunkId1";
                MainDocumentPart mainPart = myDoc.MainDocumentPart;
                AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
                    "application/xhtml+xml", altChunkId);
                using (FileStream fileStream = File.Open("xmltest.htm", FileMode.Open))
                    chunk.FeedData(fileStream);
                XElement altChunk = new XElement(w + "altChunk",
                    new XAttribute(r + "id", altChunkId)
                );
                XDocument mainDocumentXDoc = GetXDocument(myDoc);
                // Add the altChunk element after the last paragraph.
                mainDocumentXDoc.Root
                    .Element(w + "body")
                    .Elements(w + "p")
                    .Last()
                    .AddAfterSelf(altChunk);
                SaveXDocument(myDoc, mainDocumentXDoc);
            }

     And please also see this article:

    http://blogs.msdn.com/b/ericwhite/archive/2008/10/27/how-to-use-altchunk-for-document-assembly.aspx

    it contains the code about inserting the html to docx file which almost can keep the same format.

    Hope this can help you and just feel free to follow up after you have tried.

    Best Regards,


    Bruce Song [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.


    • Edited by Bruce Song Wednesday, September 21, 2011 4:08 AM
    • Proposed as answer by Bruce Song Thursday, September 29, 2011 9:30 AM
    • Marked as answer by veljkoz2 Thursday, September 29, 2011 9:42 AM
    Wednesday, September 21, 2011 4:07 AM

All replies

  • Hi Veljkoz,

    Thank you for posting.

    As far as I know, Custom XML part is used for binding content controls, we can bind content controls to elements in a custom xml part. I found you want to embed the office word 2003 xml, I don't think it can be achieved, because the XML file formats enable applications to work with documents in ways that are not possible with the older binary file formats (such as .xls, .ppt, and .doc).

    The xml string should look like this:

        string xmlString =
            "<?xml version=\"1.0\" encoding=\"utf-8\" ?>" +
            "<employees xmlns=\"http://schemas.microsoft.com/vsto/samples\">" +
                "<employee>" +
                    "<name>Karina Leal</name>" +
                    "<hireDate>1999-04-01</hireDate>" +
                    "<title>Manager</title>" +
                "</employee>" +
            "</employees>";

    Hope this can give you the hint.

    Best Regards,


    Bruce Song [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Tuesday, September 20, 2011 8:57 AM
  • I see... so adding XML is really not what I'm after - I want to be able to add WordML directly to docx.

    Isn't there some API call to add preformatted content in the form of WordML directly to document.xml (file that's residing in .docx package)?

    Tuesday, September 20, 2011 1:12 PM
  • Hi VelJkoz,

    You can save the word 2003 file as html and then embed it to the docx file via altChunk, here is the code snippet for you to reference:

     

            using (WordprocessingDocument myDoc =
                WordprocessingDocument.Open("Test2.docx", true))
            {
                string altChunkId = "AltChunkId1";
                MainDocumentPart mainPart = myDoc.MainDocumentPart;
                AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
                    "application/xhtml+xml", altChunkId);
                using (FileStream fileStream = File.Open("xmltest.htm", FileMode.Open))
                    chunk.FeedData(fileStream);
                XElement altChunk = new XElement(w + "altChunk",
                    new XAttribute(r + "id", altChunkId)
                );
                XDocument mainDocumentXDoc = GetXDocument(myDoc);
                // Add the altChunk element after the last paragraph.
                mainDocumentXDoc.Root
                    .Element(w + "body")
                    .Elements(w + "p")
                    .Last()
                    .AddAfterSelf(altChunk);
                SaveXDocument(myDoc, mainDocumentXDoc);
            }

     And please also see this article:

    http://blogs.msdn.com/b/ericwhite/archive/2008/10/27/how-to-use-altchunk-for-document-assembly.aspx

    it contains the code about inserting the html to docx file which almost can keep the same format.

    Hope this can help you and just feel free to follow up after you have tried.

    Best Regards,


    Bruce Song [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.


    • Edited by Bruce Song Wednesday, September 21, 2011 4:08 AM
    • Proposed as answer by Bruce Song Thursday, September 29, 2011 9:30 AM
    • Marked as answer by veljkoz2 Thursday, September 29, 2011 9:42 AM
    Wednesday, September 21, 2011 4:07 AM
  • We'll need some time to modify our xslt's to generate html instead of word2003xml (because we can't convert from XML to html since it's generation is happening on server so there's no Word, no automation).

    I'll let you know how it goes, thanks!
    Thursday, September 22, 2011 4:19 PM