Creating a WordprocessingDocument from XML

Answered Creating a WordprocessingDocument from XML

  • 2009年11月16日 下午 11:10
     
     
    I have an XML stream that contains a valid Word XML document (If I save it to a file I can open it with Word). How can I create a WordprocessingDocument from this XML other than to cycle it through Word.
    Ockert

所有回覆

  • 2009年11月17日 上午 01:07
     
     
    Try to use "WordprocessingDocument Create(stream, wordprocessingDocumentType)"
    Z.J.
  • 2009年11月17日 上午 10:08
     
     
    • 已編輯 GdrSeo 2011年4月25日 下午 12:53 gdrseo
    • 已編輯 GdrSeo 2011年4月25日 下午 12:54
    •  
  • 2009年11月17日 下午 01:40
     
     
    I'm using the following code and it fails with "The OpenXmlPackage.Validate method found an error in the document." If I open the same XML with Word it opens without errors or warnings.

     

    string xmlText;
    .
    .
    .
    byte
    [] xmlBytes;
    MemoryStream xmlStream;
    xmlBytes = (new UTF8Encoding()).GetBytes(xmlText);
    xmlStream =
    new MemoryStream();
    doc =
    WordprocessingDocument.Create(xmlStream, WordprocessingDocumentType.Document);
    xmlStream.Write(xmlBytes, 0, xmlBytes.Length);
    doc.Validate(
    new OpenXmlPackageValidationSettings());


    Ockert
  • 2009年11月18日 上午 09:31
     
     
    In order to see more detail about the cause of a validation failure, you need to create an event handler method that accepts OpenXmlPackageValidationEventArgs and attach this method to the EventHandler event of your OpenXmlPackageValidationSettings instance.
  • 2009年11月18日 下午 03:30
     
     
    For some reason the WordprocessingDocument object does not parse the XML after the stream are written. Am I missing a step or something?
    Ockert
  • 2009年11月19日 上午 04:05
     
     

    You cannot convert XML to a Word document directly. Word docuement is a packge, it has internal structure(If you create a Word document with Word application and change file extension".docx" to ".zip" then you can see its internal structure). So you need build internal structure when you create a document.

    Following is a sample, hope it can help you.

    ==============================================================================================

                string xmlText;
                
                ... ...
                
                byte[] xmlBytes =  (new UTF8Encoding()).GetBytes(xmlText);

                using(WordprocessingDocument package = WordprocessingDocument.Create(@"d:\test.docx", WordprocessingDocumentType.Document))
                {
                    MainDocumentPart mainDocumentPart = package.AddMainDocumentPart();

                    mainDocumentPart.GetStream().Write(xmlBytes, 0, xmlBytes.Length);
                }

    ===============================================================================================


    Z.J.
  • 2009年11月19日 上午 04:07
     
     
    Sorry, one thing need to be mentioned is xmlText must be valid Open XML such as:

    string xmlText = "<?xml version=\"1.0\" encoding=\"utf-8\"?>"
                                + "<w:document xmlns:wp=\"http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing\" xmlns:a=\"http://schemas.openxmlformats.org/drawingml/2006/main\" xmlns:pic=\"http://schemas.openxmlformats.org/drawingml/2006/picture\" xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\" xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\">"
                                + "<w:body>"
                                + "<w:p w:rsidR=\"00A2180E\" w:rsidRDefault=\"00EC4DA7\">"
                                + "<w:r>"
                                + "<w:t>t</w:t>"
                                + "</w:r>"
                                + "</w:p>"
                                + "<w:sectPr w:rsidR=\"00A2180E\" w:rsidSect=\"00A2180E\">"
                                + "<w:pgSz w:w=\"11906\" w:h=\"16838\" />"
                                + "<w:pgMar w:top=\"1440\" w:right=\"1800\" w:bottom=\"1440\" w:left=\"1800\" w:header=\"851\" w:footer=\"992\" w:gutter=\"0\" />"
                                + "<w:cols w:space=\"425\" />"
                                + "<w:docGrid w:type=\"lines\" w:linePitch=\"312\" />"
                                + "</w:sectPr>"
                                + "</w:body>"
                                + "</w:document>";
    Z.J.
  • 2009年11月19日 下午 02:59
     
     已答覆 包含代碼

    It is hard to believe that WordprocessingDocument object does not contain a native way to parse a XML Word document. I’m not sure if it is in the roadmap to eventually have it. The bottom line is that you need to create each part and stream the content into the newly added part.

     

    The code below perform the basic function. It is by no means complete. The only media type included in the code below is image parts.

                                Document = WordprocessingDocument.Create(saveFileDialog1.FileName, DocumentFormat.OpenXml.WordprocessingDocumentType.Document);
                                XmlNode relPart = xmlDoc.SelectSingleNode("pkg:package/pkg:part[@pkg:name='/word/_rels/document.xml.rels']/pkg:xmlData", nsm).FirstChild; 
                                XmlNodeList parts = xmlDoc.SelectNodes("pkg:package/pkg:part", GetNSM(xmlDoc));
                                MainDocumentPart mainPart = null;
                                Stream tempStream;
                                byte[] tempBytes;
                                string PackageName, id;
                                foreach (XmlNode part in parts)
                                {
                                    switch (part.Attributes["pkg:name"].Value)
                                    {
                                        case "/word/document.xml": //add the main part
                                            partName = "Main part";
    
                                            //add the main document part
                                            mainPart = Document.AddMainDocumentPart();
    
                                            //stream the content to the main part
                                            tempStream = mainPart.GetStream();
                                            tempBytes = (new UTF8Encoding()).GetBytes(part.FirstChild.InnerXml);
                                            tempStream.Write(tempBytes, 0, tempBytes.Length);
                                            break;
                                        case "/word/settings.xml": //Add settings part
                                            partName = "Settings part";
    
                                            //get the settings name
                                            PackageName = GetPackageName(part);
    
                                            //find the settings' id in the relations tag
                                            id = GetPackageID(relPart, PackageName);
    
                                            //add the settings part
                                            DocumentSettingsPart settingPart = mainPart.AddNewPart<DocumentSettingsPart>(id);
    
                                            //stream the content to the settings part
                                            tempStream = settingPart.GetStream();
                                            tempBytes = (new UTF8Encoding()).GetBytes(part.FirstChild.InnerXml);
                                            tempStream.Write(tempBytes, 0, tempBytes.Length);
                                            break;
                                        case "/word/webSettings.xml": //Add web settings part
                                            partName = "Web settings part";
    
                                            //get the web settings name
                                            PackageName = GetPackageName(part);
    
                                            //find the web settings' id in the relations tag
                                            id = GetPackageID(relPart, PackageName);
    
                                            //add the web settings part
                                            WebSettingsPart webSettingPart = mainPart.AddNewPart<WebSettingsPart>(id);
    
                                            //Stream the content to the web settings part
                                            tempStream = webSettingPart.GetStream();
                                            tempBytes = (new UTF8Encoding()).GetBytes(part.FirstChild.InnerXml);
                                            tempStream.Write(tempBytes, 0, tempBytes.Length);
                                            break;
                                        case "/docProps/core.xml": //Add core file properties part
                                            partName = "Core file properties part";
                                            
                                            //Add the core file properties
                                            CoreFilePropertiesPart corePart = Document.AddCoreFilePropertiesPart();
    
                                            //stream the content to the core file properties part
                                            tempStream = corePart.GetStream();
                                            tempBytes = (new UTF8Encoding()).GetBytes(part.FirstChild.InnerXml);
                                            tempStream.Write(tempBytes, 0, tempBytes.Length);
                                            break;
                                        case "/docProps/app.xml": //Add extended file properties part
                                            partName = "Core file properties part";
    
                                            //Add the extended properties part
                                            ExtendedFilePropertiesPart extendedPart = Document.AddExtendedFilePropertiesPart();
    
                                            //stream the content to the extended property part
                                            tempStream = extendedPart.GetStream();
                                            tempBytes = (new UTF8Encoding()).GetBytes(part.FirstChild.InnerXml);
                                            tempStream.Write(tempBytes, 0, tempBytes.Length);
                                            break;
                                        case "/word/fontTable.xml"://Add font table part
                                            partName = "Font table part";
    
                                            //get the font table name
                                            PackageName = GetPackageName(part);
    
                                            //find the font table's id in the relations tag
                                            id = GetPackageID(relPart, PackageName);
    
                                            //Add the font part
                                            FontTablePart fontPart = mainPart.AddNewPart<FontTablePart>(id);
    
                                            //Stream the content to the font part
                                            tempStream = fontPart.GetStream();
                                            tempBytes = (new UTF8Encoding()).GetBytes(part.FirstChild.InnerXml);
                                            tempStream.Write(tempBytes, 0, tempBytes.Length);
                                            break;
                                        case "/word/styles.xml"://Add style part
                                            partName = "Style part";
    
                                            //get the style name
                                            PackageName = GetPackageName(part);
    
                                            //find the style's id in the relations tag
                                            id = GetPackageID(relPart, PackageName);
    
                                            //Add the style part
                                            StyleDefinitionsPart stylePart = mainPart.AddNewPart<StyleDefinitionsPart>(id);
    
                                            //stream the content to the style part
                                            tempStream = stylePart.GetStream();
                                            tempBytes = (new UTF8Encoding()).GetBytes(part.FirstChild.InnerXml);
                                            tempStream.Write(tempBytes, 0, tempBytes.Length);
                                            break;
                                        default:
                                            //add the media parts
                                            if (part.Attributes["pkg:name"].Value.Contains("/word/media"))
                                            {
                                                //add the image parts
                                                if (part.Attributes["pkg:contentType"].Value.Contains("image"))
                                                {
                                                    partName = "Image part";
    
                                                    //get the image type
                                                    string[] imageTypeParts = part.Attributes["pkg:contentType"].Value.Split('/');
                                                    string imageTypeName = string.Format("{0}{1}", imageTypeParts[1].Substring(0, 1).ToUpper(), imageTypeParts[1].Substring(1));
                                                    ImagePartType imagePartType = (ImagePartType)Enum.Parse(typeof(ImagePartType), imageTypeName);
    
                                                    //get the package name
                                                    PackageName = GetPackageName(part);
    
                                                    //find the image's id in the relations tag
                                                    id = GetPackageID(relPart, PackageName);
    
                                                    //Add the image part
                                                    ImagePart imagePart = mainPart.AddImagePart(imagePartType, id);
    
                                                    //Stream the image date to the image part
                                                    tempBytes = Convert.FromBase64String(part.SelectSingleNode("pkg:binaryData", GetNSM(xmlDoc)).InnerText);
                                                    tempStream = new MemoryStream(tempBytes);
                                                    imagePart.FeedData(tempStream);
                                                }
                                            }
                                            //add the theme
                                            if (part.Attributes["pkg:name"].Value.Contains("/word/theme"))
                                            {
                                                partName = "Theme part";
                                                //get the package name
                                                PackageName = GetPackageName(part);
    
                                                //find the theme's id in the relations tag
                                                id = GetPackageID(relPart, PackageName);
    
                                                //Add the theme part
                                                mainPart.AddNewPart<ThemePart>(id);
                                                ThemePart themePart = mainPart.ThemePart;
                                                
                                                //Stream the content to the theme part
                                                tempStream = themePart.GetStream();
                                                tempBytes = (new UTF8Encoding()).GetBytes(part.FirstChild.InnerXml);
                                                tempStream.Write(tempBytes, 0, tempBytes.Length);
                                            }
                                            break;
                                    }
                                }
                                Document.Validate(validationSettings);
                                Document.Close();
    

     


    Ockert
  • 2009年11月20日 上午 02:58
     
     

    Open XML SDK is complex at current stage, after all it's only a CTP now. I belive it will improve in the future :)


    Z.J.
  • 2009年11月20日 上午 03:46
     
     

    The old validation method OpenXmlPackage.Validate () need an event handler to see detail info. Actually, you can try to use new validation feature ,

    for example:

    string testfile=@"d:\test.docx";
       OpenXmlValidator validator = new OpenXmlValidator();
          
       var errors = validator.Validate(testfile);

  • 2009年12月30日 上午 02:53
     
     

    Adding an image seems *MUCH* too complex.  I should be able to append an image into a body like a paragraph, or some text, or a run, rather than having to go way up into the Main document just to create an "imagepart".  This should NOT exist:

    ImagePart imagePart = mainPart.AddImagePart(imagePartType, id);

    If I simply want to "paste" an image somewhere in a sequence of items being appended to a body, it should be more like:

    Dim I as Image = New Image("C:\MyPic.jpg")
    myBody.Append(I)

    Done!

  • 2010年1月7日 上午 08:26
     
     

    Hi Carl Cook,

    In order to add an image to a body, not only do we need to use MainDocumentPart.AddImagePart(), but also we have to append image reference in document body. It is somehow complex. My suggestion is that you can:
    1. Create a new document and insert a picture to paragraph with Office Word, then close it.
    2. Open the document with Open XML SDK2.0 Productivity Tool for Microsoft Office (which can be downloaded from: http://www.microsoft.com/downloads/details.aspx?FamilyID=C6E744E5-36E9-45F5-8D8C-331DF206E0D0&displaylang=en)
    3. Use "Reflect Code" to see how to generate it using SDK2.0

    Hope this will help you. If you have any question, please let me know.

    Thanks,

    Lu

  • 2010年1月7日 下午 06:25
     
     
    Do you have this for the other direction too? DOCX -> Word XML?
    Windward Reports - World's Greatest SharePoint Reporting & DocGen
  • 2010年1月15日 下午 04:48
     
     
    I have been working on that. The process is to create the the core structure of the xml document:

     

    WordprocessingDocument doc = WordprocessingDocument.Open(<DOCX document path>, true);

     

    private static XNamespace ns_rel = http://schemas.openxmlformats.org/package/2006/relationships;
    private static XNamespace ns_pkg =
    http://schemas.microsoft.com/office/2006/xmlPackage
    ;

    XProcessingInstruction

    private

    static XNamespace ns_rel = http://schemas.openxmlformats.org/package/2006/relationships;
    private static XNamespace ns_pkg =
    http://schemas.microsoft.com/office/2006/xmlPackage;

    XProcessingInstruction

     

    static XNamespace ns_rel = http://schemas.openxmlformats.org/package/2006/relationships;
    private static XNamespace ns_pkg =
    http://schemas.microsoft.com/office/2006/xmlPackage;

    XProcessingInstruction

    static XNamespace ns_rel = http://schemas.openxmlformats.org/package/2006/relationships;
    private static XNamespace ns_pkg =
    http://schemas.microsoft.com/office/2006/xmlPackage;

    XProcessingInstruction

     

    app = new XProcessingInstruction("mso-application", "progid=\"Word.Document\"");
    XDocument part;
    OpenXmlPart xmlPart;
    XElement FileRelationshipsPart,fileRelationships,DocumentRelationshipsPart,documentRelationships;
    X_Document =
    new XDocument(app
        ,
    new XElement(ns_pkg + "package"
        ,
    new XAttribute(XNamespace.Xmlns + "pkg", ns_pkg)
        ,FileRelationshipsPart =
    new XElement(ns_pkg + "part"
        ,new XAttribute(ns_pkg + "name", "/_rels/.rels")
        ,
    new XAttribute(ns_pkg + "contentType", "application/vnd.openxmlformats-package.relationships+xml")
        ,
    new XAttribute(ns_pkg + "padding", "512")
        ,
    new XElement(ns_pkg + "xmlData"
        ,fileRelationships = new XElement(ns_rel + "Relationships"
        ,new XAttribute("xmlns", ns_rel))))
        ,DocumentRelationshipsPart =
    new XElement(ns_pkg + "part"
        ,new XAttribute(ns_pkg + "name", "/word/_rels/document.xml.rels")
        ,
    new XAttribute(ns_pkg + "contentType", "application/vnd.openxmlformats-package.relationships+xml")
        ,
    new XAttribute(ns_pkg + "padding", "256")
        ,
    new XElement(ns_pkg + "xmlData"
        ,documentRelationships = new XElement(ns_rel + "Relationships"
        ,new XAttribute("xmlns", ns_rel))))));
    fileRelationships.Add(
    new XElement(ns_rel + "Relationship"
        ,new XAttribute("Id", "rId3")
        ,
    new XAttribute("Type", http://schemas.openxmlformats.org/officeDocument/2006/relationships/extended-properties)
        ,
    new XAttribute("Target", "docProps/app.xml"))
        ,
    new XElement(ns_rel + "Relationship", new XAttribute("Id", "rId2")
        ,
    new XAttribute("Type", http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties)
        ,
    new XAttribute("Target", "docProps/core.xml"))
        ,
    new XElement(ns_rel + "Relationship"
        ,new XAttribute("Id", "rId1")
        ,
    new XAttribute("Type", http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument)
        ,
    new XAttribute("Target", "word/document.xml")));

    Then add the XML for each part by iterating through the parts in the document and add the part’s Xml to the Xml document:

    xmlPart = doc.MainDocumentPart;


    Ockert
  • 2010年1月18日 上午 05:10
     
     
    Hi Ockert,
    The Word document as a whole cannot be simply represented as an XML stream. It uses the zip technology to pack individual parts. Is your XML stream in Office 2003 format?
    Also the Open() method on WordprocessingDocument could open an stream and parse its content. The stream needs to be conformant to the Open XML File Format (Word2007 and beyond) though.

    Thanks,
    --L.
  • 2010年1月20日 下午 04:46
     
     

    Lanqing,

    The main advantage of Office 2007 and beyond for me is that documents can be represented as an XML stream.

    The Open XML SDK for Microsoft Office is a great library that provides all sorts of API calls to work on the documents exspecially version 2.0.  Unfortunately it does not provide me with a way to manipulate the document as I need to. My project is storing the XML streams of document fractions (each a valid XML representation of a document) in a database. I then need to make small changes to the stored documents as I assemble a set of them into a new document.  I have been using the Office Object Model but it has proven to be a bit unpredictable, Office throwing COM errors which is difficult to manage from C# and it is requiring Office to run in a unattended mode which is not recommended. Having the ability to manipulate XML will give me a lot more flexibility and should be more reliable.

    Thanks

    Ockert


    Ockert
  • 2010年7月22日 下午 04:30
     
     

    Guys,

    Any update on this? i am facing the same problem. is this has been solved in Open XML 2.0?

     

    Noam

  • 2010年7月27日 下午 07:43
     
     
    how to count cards short male soda acne lemon for ____ job putting together a lovely looking group of ladies. However, i've never picked up an escort bayan OL wondering how good the girls will look in this months edition, and I dont plan on doing that any time soon. There are plenty of other publications escort bayan that wont force me to use my weak imagination nearly as much, and that are'nt as easily accessible to young children. partner (for good reason) Now, if i want porno izle to see beautiful women in the woods and waters of our awesome nation, i guess i'll just have porno izle to hope that OL puts Tiffany Lakosky in next months print!!! porno izle story comes with a young man escort bayanlar Oh and Big Brian at Clash was the first person escort bayan
  • 2010年7月29日 上午 11:38
     
     
    Hi Ockert,
    The Word saç ekimi document as a whole cannot be simply represented as an evden eve nakliyat XML stream. It uses the zip technology to pack individual parts. muzik dinle Is your XML stream in Office 2003 format?
  • 2010年8月16日 下午 05:00
     
     

    A Word 2007 document can be represented as an XML string as a whole.


    Ockert
    • 已提議為解答 Hemant Sir 2012年7月24日 上午 07:08
    • 已取消提議為解答 Hemant Sir 2012年7月24日 上午 07:18
    •  
  • 2012年7月24日 上午 07:18
     
     

    XML Word document can be converted in Docx format, I was working on it recently I got the solution for this. If you have the MS word Xml document saved in DB or have that on you system. you will  need to load that doc in XMLDocument. I did it as follows:

     XMLDocument Xdoc=new XMLDocument.Load("D:\xmlDoc.xml");     

     XmlNodeList bodycontent = XDoc.GetElementsByTagName("w:body");     // extract the body part from the word xml document.
                XmlNode body_node = bodycontent[0];


                using (WordprocessingDocument mainDocument = WordprocessingDocument.Create(@"D:\RawDoc.docx",DocumentFormat.OpenXml.WordprocessingDocumentType.Document))  
                {
                    MainDocumentPart mainPart = mainDocument.AddMainDocumentPart();

                    // Create the document structure and add some text.
                        mainPart.Document = new DocumentFormat.OpenXml.Wordprocessing.Document();                
                        XElement tempBody = XElement.Parse(body_node.OuterXml);
                        mainDocument.MainDocumentPart.Document.AppendChild(new Body(tempBody.ToString()));
                        mainDocument.MainDocumentPart.Document.Save();
                        mainDocument.Package.Flush();
                    }

      
    • 已編輯 Hemant Sir 2012年8月18日 上午 09:48
    •