none
Preserving header and footer in merged Word document RRS feed

  • Question

  • I am trying to merge multiple DOCX files into one document using Open XML SDK 2.  Using the code below (simplified to merge one file), I see the body of the source document in the target document, but I do not get the header and footer to merge.  Can someone please help?  Thank you.

          Using myDoc As WordprocessingDocument = WordprocessingDocument.Open("c:\temp\starter.docx", True)
                Dim altChunkId As String = "AltChunkId1"
                Dim mainPart As MainDocumentPart = myDoc.MainDocumentPart
                Dim chunk As AlternativeFormatImportPart = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML, altChunkId)
                Using fileStream As FileStream = File.Open("c:\temp\letter.docx", FileMode.Open)
                    chunk.FeedData(fileStream)
                End Using
                Dim altChunk As New AltChunk()
                altChunk.Id = altChunkId
                altChunk.AltChunkProperties = New AltChunkProperties()
                altChunk.AltChunkProperties.MatchSource = New MatchSource()
                altChunk.AltChunkProperties.MatchSource.Val = OnOffValue.FromBoolean(True)
                mainPart.Document.Body.InsertAfter(altChunk, mainPart.Document.Body.Elements(Of Paragraph)().Last())
                mainPart.Document.Save()
            End Using

    Friday, November 9, 2012 1:34 AM

All replies

  • Hi Heel,

    Thanks for posting in the MSDN Forum.

    I think we can't get the headers or footers of the document which has been merged via your code. It's on my experience that different headers / footers only exist in different section. I mean in specific section we can see only a kind of header / footer. You code will read the merged document as a byte stream into the target document and will not create header / footer part.

    Have a good day,

    Tom


    Tom Xu [MSFT]
    MSDN Community Support | Feedback to us

    Monday, November 12, 2012 7:13 AM
    Moderator
  • Hi Heel Tar

    It can be done using altChunk but, as Tom indicates, preservation of Headers/Footers requires section breaks.

    You can see this in Word by opening a document, creating a header, then inserting another document with a different header into it. The header of the inserted document is lost because it's associated with the "last paragraph mark" of the document which is cut off. (Actually, what you lose is the SectionProperties child element of the Body element).

    Start over and this time first open the document you want to insert. click at the end of the document, then go to Page Layout/Page Setup/Breaks. Insert a Next Page section break. This will be a child of the last paragraph's ParagraphProperties child and will carry the header/footer reference information with it, as well as margin settings, paper orientation, number of newspaper columns, etc.

    Do the same for the second (target) document, then position the cursor at the end of that document (after the section break). Insert the first document and you should see that both retain their headers and footers.

    Here's some code to illustrate this using the Open XML sDK - not too "pretty" as I haven't had time to clean it up. It creates a new document and inserts the section break at the last paragraph mark (no header or footer). Then the code loops through all files (Word documents in *.dox format!) in a specified directory, inserts a section break at the last paragraph, saves the result, then imports it into the new document.

    private void btnMergeWordDocs_Click(object sender, EventArgs e)
    {
        string sourceFolder = @"C:\Test\MergeDocs\";
        string targetFolder = @"C:\Test\";
    
        string altChunkIdBase = "acID";
        int altChunkCounter = 1;
        string altChunkId = altChunkIdBase + altChunkCounter.ToString();
    
        MainDocumentPart wdDocTargetMainPart = null;
        Document docTarget = null;
        AlternativeFormatImportPartType afType;
        AlternativeFormatImportPart chunk = null;
        AltChunk ac = null;
        using (WordprocessingDocument wdPkgTarget = WordprocessingDocument.Create(targetFolder + "mergedDoc.docx", DocumentFormat.OpenXml.WordprocessingDocumentType.Document, true))
        {
            //Will create document in 2007 Compatibility Mode.
            //In order to make it 2010 a Settings part must be created and a CompatMode element for the Office version set.
            wdDocTargetMainPart = wdPkgTarget.MainDocumentPart;
            if (wdDocTargetMainPart == null)
            {
                wdDocTargetMainPart = wdPkgTarget.AddMainDocumentPart();
                Document wdDoc = new Document(
                    new Body(
                        new Paragraph(
                            new Run(new Text() { Text = "First Para" })),
                            new Paragraph(new Run(new Text() { Text = "Second para" })),
                            new SectionProperties(
                                new SectionType() { Val = SectionMarkValues.NextPage },
                                new PageSize() { Code = 9 },
                                new PageMargin() { Gutter = 0, Bottom = 1134, Top = 1134, Left = 1318, Right = 1318, Footer = 709, Header = 709 },
                                new Columns() { Space = "708" },
                                new TitlePage())));
                wdDocTargetMainPart.Document = wdDoc;
            }
            docTarget = wdDocTargetMainPart.Document;
            SectionProperties secPropLast = docTarget.Body.Descendants<SectionProperties>().Last();
            SectionProperties secPropNew = (SectionProperties)secPropLast.CloneNode(true);
            //A section break must be in a ParagraphProperty
            Paragraph lastParaTarget = (Paragraph)docTarget.Body.Descendants<Paragraph>().Last();
            ParagraphProperties paraPropTarget = lastParaTarget.ParagraphProperties;
            if (paraPropTarget == null)
            {
                paraPropTarget = new ParagraphProperties();
            }
            paraPropTarget.Append(secPropNew);
            Run paraRun = lastParaTarget.Descendants<Run>().FirstOrDefault();
            //lastParaTarget.InsertBefore(paraPropTarget, paraRun);
            lastParaTarget.InsertAt(paraPropTarget, 0);
    
            //Process the individual files in the source folder.
            //Note that this process will permanently change the files by adding a section break.
            System.IO.DirectoryInfo di = new System.IO.DirectoryInfo(sourceFolder);
            IEnumerable<System.IO.FileInfo> docFiles = di.EnumerateFiles();
            foreach (System.IO.FileInfo fi in docFiles)
            {
                using (WordprocessingDocument pkgSourceDoc = WordprocessingDocument.Open(fi.FullName, true))
                {
                    IEnumerable<HeaderPart> partsHeader = pkgSourceDoc.MainDocumentPart.GetPartsOfType<HeaderPart>();
                    IEnumerable<FooterPart> partsFooter = pkgSourceDoc.MainDocumentPart.GetPartsOfType<FooterPart>();
                    //If the source document has headers or footers we want to retain them.
                    //This requires inserting a section break at the end of the document.
                    if (partsHeader.Count() > 0 || partsFooter.Count() > 0)
                    {
                        Body sourceBody = pkgSourceDoc.MainDocumentPart.Document.Body;
                        SectionProperties docSectionBreak = sourceBody.Descendants<SectionProperties>().Last();
                        //Make a copy of the document section break as this won't be imported into the target document.
                        //It needs to be appended to the last paragraph of the document
                        SectionProperties copySectionBreak = (SectionProperties)docSectionBreak.CloneNode(true);
                        Paragraph lastpara = sourceBody.Descendants<Paragraph>().Last();
                        ParagraphProperties paraProps = lastpara.ParagraphProperties;
                        if (paraProps == null)
                        {
                            paraProps = new ParagraphProperties();
                            lastpara.Append(paraProps);
                        }
                        paraProps.Append(copySectionBreak);
                    }
                    pkgSourceDoc.MainDocumentPart.Document.Save();
                }
                //Insert the source file into the target file using AltChunk
                afType = AlternativeFormatImportPartType.WordprocessingML;
                chunk = wdDocTargetMainPart.AddAlternativeFormatImportPart(afType, altChunkId);
                System.IO.FileStream fsSourceDocument = new System.IO.FileStream(fi.FullName, System.IO.FileMode.Open);
                chunk.FeedData(fsSourceDocument);
                //Create the chunk
                ac = new AltChunk();
                //Link it to the part
                ac.Id = altChunkId;
                docTarget.Body.InsertAfter(ac, docTarget.Body.Descendants<Paragraph>().Last());
                docTarget.Save();
                altChunkCounter += 1;
                altChunkId = altChunkIdBase + altChunkCounter.ToString();
                chunk = null;
                ac = null;
            }
        }
    
    }


    Cindy Meister, VSTO/Word MVP, my blog

    Monday, November 12, 2012 10:21 AM
    Moderator
  • Hi Cindy,

    Thanks for you adjustment.

    @Heel,

    Would you please try Cindy's suggest?

    Have a good day,

    Tom


    Tom Xu [MSFT]
    MSDN Community Support | Feedback to us

    Tuesday, November 13, 2012 5:29 AM
    Moderator
  • Thank you very much for the response and sample code, Cindy.  I tried the code and it works for the most part, but I need assistance with two issues:

    1. The default header of my source documents uses page numbering.  How can I reset the page numbering within the merged documents such that the numbering restarts with each merged document?  I tried creating a PageNumberType instance with Start=1 and appending it to the copySectionBreak SectionProperties, but this did not work.
    2. The first document in the merged document has an incorrect header (it has the default rather than the header for the first page), and the footer is missing.  The other merged documents with similar header/footer structures look fine.
    Tuesday, November 13, 2012 3:05 PM
  • Hi Heel Tar

    #2 appears to be a bug in how the Word application renders the altChunks when it incorporates them into the main document - see my post to this forum earlier today:
    http://social.msdn.microsoft.com/Forums/en-US/oxmlsdk/thread/27e68a06-2b6f-482a-9011-db55a73269c3

    My brain is working on what might be done about it, but unless Word Automation Services are available to render the document then save it in the rendered state (thus incorporating all the altChunk files into the document.xml) so that you can work on it again in the Open XML SDK in a second pass, the only possibilities I'm coming up with would require the Word Interop to restore the "Different First Page" settings. For example, we can check whether a "First" type of header is referenced and set a bookmark with a name that tells us a section should have "Different First Page".

    I haven't looked at PageNumbering, yet, so can't offer an opinion on that at this time. Analysing all the different aspects of this question has taken many hours of many days, so far...


    Cindy Meister, VSTO/Word MVP, my blog

    Tuesday, November 13, 2012 3:26 PM
    Moderator
  • Page Numbering:

    It works in some cases and not in others. I haven't done extensive testing, but it appeared to NOT work when the document had only a Default header defined. In a document with DifferentFirstPage - that was lost - (and for all other documents, for that matter) the setting was retained (even generated where I didn't set it)).

    The document where it was not retained was the first altChunk, although I can't be sure whether that had anything to do with it...


    Cindy Meister, VSTO/Word MVP, my blog

    Tuesday, November 13, 2012 4:20 PM
    Moderator
  • All of my source documents have header of DifferentFirstPage (an image) plus the default.  A variation of the document has a second section - one page with no header but a footer.  My merged document never shows the correct page numbers.  However, when the document with two sections is first, followed by one or more one-section documents, the numbering for the second document is reset, though the page number begins with two, as if it is counting the page from the section of the second section as "one".  Then numbering is continuous and never reset.

    Given what you've seen, do you see a way at this point to get page numbering correct in the merged document? 

    Tuesday, November 13, 2012 5:32 PM
  • Hi Heel Tar

    At the moment, the only possibility I see is to use the interop - unless, as I mentioned before, you have access to Word Automation Services - to recreate these settings. Perhaps a macro in Word, or an Add-in.

    I've already set it up as a test case for Different First Page, placing a bookmark just before the copied section properties. After the document is opened, I run a macro that loops the bookmarks looking for the appropriate bookmark name, setting DifferentFirstPage, then deleting the bookmark. It works fine.

    A similar approach would work for restarting numbering.

    A macro could be intergrated as part of the document (docm, then, with the associated security issues). Or it could be a template or COM add-in that monitors DocumentBeforeOpen. In that case, you'd want a document property or document variable to identify these documents.

    I've alerted MS to the bug, but even if it's acknowledged it won't be fixed quickly. Unless you happen to be a "Partner" with a support contract that can leverage this into a "hot fix".


    Cindy Meister, VSTO/Word MVP, my blog

    Tuesday, November 13, 2012 6:04 PM
    Moderator
  • NEW THOUGHTS: The other thing that has occurred to me would be to bring in these outside documents as SubDocuments (Master Document functionality, which you can see in the Outline View in Word). That is very old, "tried-and-true" technology, but the problem is the documents must be external and are linked in. They can later be fully integrated into the document, but again only through the interop.

    Hmmm, thinking about that... MasterDocument always starts out with TWO section breaks between documents, one of which is Continuous and is part of the "Master" (our "target"). I'm at the end of my day, here (after 7 p.m.) so I can't do any testing on it, but I do have to wonder if the reason for this second section break may be related to the problems we're seeing. You might try inserting a continuous section break in the "target" before each altChunk, just to see if it helps anything...


    Cindy Meister, VSTO/Word MVP, my blog

    Tuesday, November 13, 2012 6:10 PM
    Moderator
  • Hi Heel Tar

    The only other way, of course, is to recreate the documents "from scratch", which is a lot more work than the other methods discussed here. I've created a sample which copies over the documents, plus any graphics that are categorized as "blip" or "VML". You can download it here (the blog article isn't written, yet):
    http://homepage.swissonline.ch/cindymeister/BLOG/BlogLink.htm

    If your documents have a different kind of graphic you'll need to research what it is. You'll see that I've created an Overload for the methods for copying over these parts, so generating additional methods for other Part types should be fairly simple.

    My sample also demonstrates body vs. header/footer handling of these parts.

    Any documents containing other kinds of things, such as Custom XML Parts, Charts, ActiveX controls, etc. will also require supplementing the code with methods for handling these knds of parts, along similar lines.


    Cindy Meister, VSTO/Word MVP, my blog

    Thursday, November 15, 2012 3:55 PM
    Moderator
  • Hi Heel Tar,

    Where you able to resolve/work around the issues you are having merging Word documents using any of Cindy's suggestions including:
    Including an extra section break before merging the first document,
    Using SubDocuments,
    Using her sample code to create the document "from scratch"?

    From reading through the post, it seems that you have multiple "complete" documents that each contain their own headers, footers and page number and that you want all of this retained in the resulting document.

    To better understand your ultimate goal, can you explain why you are trying to merge multiple documents into a single file but still basically treat all of the components as individual document (based on the premise that the headers and footers are different in each section and more importantly that the page numbering restarts at each section)?

    Best Regards,

    Donald M.
    Microsoft Online Community Support
    --------------------------------------------------------------------------------
    Please remember to click "Mark as Answer" on the post that helps you, and to click "Unmark as Answer" if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.

    Monday, November 19, 2012 11:42 PM
  • Donald,

    I have made no more progress on my end with the merging of documents.  I actually have gotten quite a bit busier in the past week with a deadline looming, and therefore have not been able to respond in this thread.  I really do appreciate the suggestions that Cindy has given, but have not tried the ones related to automation and interop due to my unfamiliarity in those areas.  I did run the sample application OpenXMLSDK_MergeWordDocs, and it consistently gave an exception on line 287 due to paraProps.SectionProperties being null.  I should have given this feedback earlier - sorry about that.

    Let me try to explain what I am doing, and you or Cindy can tell me if the merge approach is the best way.  I need to generate a fairly large number of word docs (around 500-600 per day) on the fly from database content.  These documents need to exist both as individual files to be sent electronically, and as printed copies to be mailed. 

    There did not seem to be an efficient way to print these documents.  Printing from file explorer opens and closes Word for each file printed.  The customer came up with the idea of producing a merged document containing all the individual files which could be printed, as a much quicker way to get printouts of these files.  I should note that the merged file does not need to be retained; it is only for printing and then can be deleted.

    I explored the merge concept for a few days, noticed the header and footer not appearing, and came to this forum.  Cindy's sample code from Nov. 12 got me considerably closer, but we then ran into likely bugs in the platform (noted above).  Even so, the original documents can be seen intact with headers/footers within alt chunks of the merged document (done by changing the extension to .zip and examining the internal content).

    Having said all this, is there a better way to achieve my goal than the merge approach with the Open XML SDK?  Should I attempt to construct them using Word interop?  Or is there a way to batch print word documents that does not involve Word being brought up and closed for each document? 

    Wednesday, November 21, 2012 12:46 AM
  • Hi Heel Tar

    << I did run the sample application OpenXMLSDK_MergeWordDocs, and it consistently gave an exception on line 287 due to paraProps.SectionProperties being null.  >>

    I did test extensively before I posted the sample, so I'd need more context around the error you're seeing. Especially since I don't know how to find a specific line of code (no numbers?)...

    If you have access to Word Automation Services (requires SharePoint 2010) then it might be possible to batch-print that way.

    It's also certainly possible to print using the Interop in such a way that Word is not started and exited for every document. It would probably be slower, although perhaps not appreciably, and the process should definitely be monitored as Word could throw up messages requiring user interaction. (No one would have to sit there and watch all the documents be opened, printed and closed; but someone should be working close by so that they can glance at the screen every now and then.)


    Cindy Meister, VSTO/Word MVP, my blog

    Wednesday, November 21, 2012 3:08 PM
    Moderator
  • The sample application gives a System.ArgumentNullException in the CopyBodySectionPropsIfNoSectionInLastPara() method on the statement:

    int existingSection = paraProps.SectionProperties.Count();

               //If there are no paraProperties they're needed and the section can be appended
                if (paraProps == null)
                {
                    paraProps = new ParagraphProperties();
                    lastpara.Append(paraProps);
                    paraProps.Append(copySectionBreak);
                }
                else
                {
                    //The last paragraph could have properties, but no section.
                    //If is no section break in the last paragraph...
                    int existingSection = paraProps.SectionProperties.Count();
                    if (existingSection < 1)
                        paraProps.Append(copySectionBreak);
                }
            }

    I can see that paraProps.SectionProperties is null.

    The application does merge simple documents, but fails on the ones I was trying to merge, those with first page only header containing image, default header containing page number, plus first page only footer.

    Thursday, November 22, 2012 12:00 AM
  • Hi Heel Tar,

    I understand that you have a process which is generating these individual Word documents and the reason you are merging them together is to try to improve performance when printing so you do not have to open and close Word each time. What is unclear is if a user is manually starting the merge and/or print process or if these are "automatic" process that run without user interaction.

    I bring this up because, as described in KB 257757, we do not recommend or support the server-side automating Office which includes running Office under Windows Task Scheduler except under the interactive session of the user who is logged on. Basically this means that if you have a user log onto the machine and manually start the process you are fine. That being said, using Open XML to merge the documents is fully supported as an automated process on the server, but you may run into a limitation depending on how you are printing them.

    I still have not been able to determine what is happening with the Open XML yet, but I was able to code a solution using a few lines of VBA (again within the limits of automating Office on a server). Basically if you call the following lines of each document you want to merge it will maintain the headers and footers and force each section to restart page numbering at 1.

        Application.ActiveDocument.Sections.Add
        Application.ActiveDocument.Sections.Last.Footers(wdHeaderFooterPrimary).PageNumbers.RestartNumberingAtSection = True
        Application.ActiveDocument.Sections.Last.Footers(wdHeaderFooterPrimary).PageNumbers.StartingNumber = 1
        Application.ActiveDocument.Sections.Last.Range.ImportFragment "<path to Document>"

    Best Regards,

    Donald M.
    Microsoft Online Community Support
    --------------------------------------------------------------------------------
    Please remember to click "Mark as Answer" on the post that helps you, and to click "Unmark as Answer" if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.

    Monday, December 17, 2012 10:35 PM
  • Hi Cindy

    I came across your article and found it extremely helpful. I am stitching a couple of documents together with a requirement that each document should retain its header and footer information in the final document. Using AltChunk instead of raw OpenXml saves a lot of effort regarding styles, formatting, etc. etc.

    Unfortunately, after a couple of days I can't seem to get a 100% working version due to a small and frustrating issue and I hope you have some insight as to a workaround?

    I modify each sub document, prior to adding it (as an chunk) into the master, by moving the last section properties into the paragraph (as per your sample code above), but Word seems to be adding a blank paragraph to each of these documents as it renders them in the final document.  I end up with:

    document 1 with correct header and footer
    [section properties/break]
    [blank paragraph]
    document 2 with correct header and footer
    [section properties/break]
    [blank paragraph]
    etc.

    I cant remove the blank paragraphs afterwards, as I ideally don't want to use WAS to render the document first.

    It seems as if you cannot have a next-page section break without a following paragraph - can you confirm or shed some light?

    Regards,
    Wim

    Thursday, February 27, 2014 10:26 AM
  • Hi Wim

    I'm not certain I understand the exact problem...

    Certainly, you cannot have a NextPage section break without a next page, and a page must have a paragraph on it.

    If you consider the actual Word Open XML and the Open XML sDK code you're using, you'll also note that every section break except the last is associated with a paragraph and is incorporated into a paragraph. This means the paragraph will be AFTER the break.

    Inserting a file containing a NextPage section break (talking UI now, not SDK) shows the paragraph as well. (AltChunk basically "mimics" Insert/File in the UI.)


    Cindy Meister, VSTO/Word MVP, my blog

    Tuesday, March 4, 2014 4:41 PM
    Moderator