none
Word Document Generation from .dotx - Corrupt Output File RRS feed

  • Question

  • Hello,

    First a little background: I am working on generating a report from a .dotx template using the OpenXML SDK 2.0 and a SharePoint Sequential Workflow and am running into some issues.  The template is pulled from a SharePoint Document Library and uses content controls along with a CustomXML file generated with the Word 2007 Content Control ToolKit bound to the content controls using the tool.  Whenever I try to open the generated file in either Word or the Open XML SDK 2.0 Productivity Tool, it reports that the file is corrupted and will not open.  Oddly enough, when opening in Word I can get the file to recover enough content to generate the template as a new document without my CustomXML files.

    I think that i've narrowed down the problem to this area of my code:

    private void replaceCustomXML(WordprocessingDocument doc, string xml, CustomXmlPart part)
    {
           doc.MainDocumentPart.DeletePart(part);
           CustomXmlPart newXMlPart = doc.MainDocumentPart.AddCustomXmlPart(CustomXmlPartType.CustomXml);
           using (StreamWriter ts = new StreamWriter(newXMlPart.GetStream())) ts.Write(xml);
    }

    Which is basically the ReplaceCustomXML function from here modified to work with the 2.0 version of the SDK (as others have noted the sample code doesn't work with the 2.0 version) and changed to only delete the 'part' that contains the content controls xml, which will be replaced with a new version of the modified xml.  When inspecting the zip file after changing extensions of the output file, I can see that indeed the previous CustomXML file and its properties file (named "item2.xml" and "item2Props.xml") have both been deleted, and a new file appears called "item6.xml" after the last xml file.  The "item6.xml" file also is not extractable from the zip file, and gives errors.

    None of the errors mentioned above give any descriptions of what is causing them.

    The reason for these extra CustomXML items is because the template is because SharePoint seems to add them files when they are added to a document library.  What doesn't get generated however is the "item#Props.xml" file that the others have.  This is the only guess I can make as to the problem.  In case it will help, here is the rest of the relevant code (Some constant string and ints are not listed here because they are correct and not relevant to the issue):

    String documentName = DOCUMENT_TITLE;
    
    WordprocessingDocument statusReportTemplate;
    SPDocumentLibrary templateLibrary;
    CustomXmlPart controlsXMLPart;
    XDocument controlsXMLDocument;
    MemoryStream newReportStream = new MemoryStream();
    byte[] xmlData, fileData;
    
    //Make sure that the template list is acutally a document library
    if (web.Lists[SiteConstants.Lists.TEMPLATE_LIBRARY] is SPDocumentLibrary)
        templateLibrary = web.Lists[SiteConstants.Lists.TEMPLATE_LIBRARY] as SPDocumentLibrary;
    else
    {
        historyDescription = SiteConstants.Lists.TEMPLATE_LIBRARY + " is not a library!";
        return;
    }
    
    //Get the OpenXml WordProcessingDocument representing the status report template from the template library
    fileData = web.GetFile(SiteConstants.URLs.TEMPLATE).OpenBinary();
    newReportStream.Write(fileData, 0, fileData.Length);
    statusReportTemplate = WordprocessingDocument.Open(newReportStream, true);
                
    //The content control binding xml is stored as the second custom xml document in the template
    controlsXMLPart = statusReportTemplate.MainDocumentPart.GetPartsOfType<CustomXmlPart>().ElementAt<CustomXmlPart>(CONTROLS_XML_INDEX);
    
    xmlData = new byte[controlsXMLPart.GetStream().Length];
    controlsXMLPart.GetStream().Read(xmlData, 0, (int)controlsXMLPart.GetStream().Length);
    controlsXMLDocument = XDocument.Parse(System.Text.Encoding.Default.GetString(xmlData));
    
    //Set all the Content Control Tags to dummy data
    foreach (XElement element in controlsXMLDocument.Root.Elements())
    {
        element.SetValue("Text Replaced!");
    }
    
    String xmlTest = controlsXMLDocument.ToString();
    replaceCustomXML(statusReportTemplate, controlsXMLDocument.ToString(), controlsXMLPart);
    
    using (FileStream fstream = new FileStream("c:\\TestDoc.dotx", System.IO.FileMode.CreateNew))
    {
        newReportStream.WriteTo(fstream);
    }
                
    statusReportTemplate.Dispose();
    The XML is retrieved and edited correctly from the document, and appears perfect in string form in the "xmlTest" variable.  Also this method of getting the WordProcessingDocument into memory and saving it back out to a file works fine as long as I don't modify the MainDocumentPart CustomXMLParts.  Does anyone have an idea of what is going wrong here?

    Friday, March 2, 2012 9:48 PM

All replies

  • I'm fighting the same issue. I was wondering if you have found a solution yet?

    Wednesday, March 21, 2012 7:32 PM
  • I probably won't help you but this kind of error appears very quickly when using Open XML. But sometimes this error is sometimes not fatal and MS Word will offer you a fix for the document (like in your case). This usually a lot of help, because you can then use the fixed document as a reference and compare the corrupt document to a fixed one and all you have to do is finding the difference between them.

    You may also take a look at Docentric toolkit which aims to solve the document generation scenarios. It has features of an a reporting toolkit except you create temlplates in MS Word and its report engine outputs .docx documents.

    Wednesday, April 4, 2012 12:54 PM
  • Hi,

    Please refer to the forum link http://social.msdn.microsoft.com/Forums/en-US/oxmlsdk/thread/3bf41b18-e325-4349-8bf9-e6f2e97b1baf which talks about the similar requirement.

    Hope this helps. Please let me know if there are further questions. 

    -Regards

    Pradip

    Wednesday, April 4, 2012 4:34 PM
  • Hi,

    I think the problem is that you aren't writing the custom xml as unicode, try specifying it as below

    private void replaceCustomXML(WordprocessingDocument doc, string xml, CustomXmlPart part)
    {
           doc
    .MainDocumentPart.DeletePart(part);      
           CustomXmlPart newXMlPart = doc.MainDocumentPart.AddCustomXmlPart(CustomXmlPartType.CustomXml);      
           using (StreamWriter ts = new StreamWriter(newXMlPart.GetStream(), Encoding.Unicode)) ts.Write(xml);
    }

    Thursday, April 19, 2012 11:59 AM
  • I had the same issue. To fix it I had to add the following lines of code

    MainDocumentPart mainPart = wordDoc.MainDocumentPart;
    mainPart.Document.Save();

    Friday, June 29, 2012 3:27 AM
  • Hi,

    I have just started with OpenXML. I am trying to modify an application. This application was intended to generate word document dynamically. This application takes several  MS XML Documents (ex:  P1.xml, P2.xml) as input, read those documents with XMLDocument class of .net, merge these XML doc by appending innerXML of <w:body> tag of P2.xml within <w:body> of P1.xml and save it back as an xml document. I want to save this XML Document as Docx. I am using the fololwing code-

     byte[] xmlBytes = (new UTF8Encoding()).GetBytes(XDoc.InnerXml);    /*XDoc is an object XmlDocument class. XDoc contains the xml represent of word file when I open it with MS Word, it opens correctly */ 

              
                using (WordprocessingDocument package = WordprocessingDocument.Create(@"D:\FinalDoc.Docx", DocumentFormat.OpenXml.WordprocessingDocumentType.Document))
                {
                    MainDocumentPart mainDocumentPart = package.AddMainDocumentPart();
                   

                    mainDocumentPart.GetStream().Write(xmlBytes, 0, xmlBytes.Length);
                } 

    But the files created with this code always get corrupted. 

    Monday, July 23, 2012 7:19 AM