none
Merged docs, somehow first section break is massacred

    Question

  • Hiya folks

    I have a main document (currently empty, just one 'tag/token' to locate the location to insert open xml in the file).

    Now, I'm using the 'Item.docx' to merge into that one (which will have some data bindings later).
    Item.docx contains section breaks (continuous). Now for some reason after merging Word interprets the first one as a 'next page' even while displaying continuous.

    If I extract the docx package I can see that the 2 chunk docx files are actually correct containing and displaying continuous.

    At first glance it appears to be the first break only, but I'm not sure if it's perhaps also an issue when you start mixing many breaks in templates.

    Anyone have any clue as to why this could happen?

    For you comfort I've also assembled a small example:
    Example.rar
    http://www22.zippyshare.com/v/20461575/file.html

    I'm using Word 2013 btw, but also tested with a 2010 with the same results.




    • Edited by Kevin VP Thursday, March 14, 2013 7:28 PM
    Thursday, March 14, 2013 7:19 PM

Answers

  • Hi Kevin

    I was actually thinking about a more piecemeal approach for your problem, where you pick up the information section-by-section, creating the section breaks "manually" in the target document.

    I'm not in full "Open XML" mode at the moment, where I can quote the object model off the top of my head (too much other stuff going on), but:

    You open the source file as an Open XML document. Get a IEnumerable collection of the Paragraphs. Loop the collection, as you go, you should be able to append the Paragraph.OuterXML to the XML of the target document (or create a Paragraph object in the document and assign the Paragraph.InnerXML to it - there are lots of variatons for transferring the XML as a block).

    Another possibility that might work, and would be faster if it does, would be to simply append the Body.InnerXML of the source to the target.

    Of course, in either case, you need to watch out for and handle any references to content contained in other files, such as pictures, headers, footers, pictures in headers/footers, etc.


    Cindy Meister, VSTO/Word MVP, my blog

    Monday, March 18, 2013 10:21 AM

All replies

  • Hi Kevin

    I'm not sure I follow everything you're doing, but apparently you're incorporating outside documents in another document using the altChunk method?

    It's not clear what you mean by this statement: "Now for some reason after merging Word interprets the first one as a 'next page' even while displaying continuous." Do you mean you're seeing "Continuous" on the page, but Word is forcing a new page before the section's content?

    If yes, what's the style formatting of the first line after the section break? Is it something like "Heading 1"? Is this style defined with "Page Break Before" in either document?

    Some further thoughts on altChunk and section breaks, in case the style formatting turns out to not be the issue (so that I don't lose the information)...

    The drawback/issue with this "simple" method is that it relies on how the Word application decides to incoporate these "embedded" files into the container document. In my experience, the application sometimes makes some odd choices that yield unexpected results if the "embedded" documents have any amount of complexity. Section breaks with headers and footers are definitely complex. You might want to look up a discussion between me and "Heel tar" in this forum about merging these kinds of documents...

    That said, the behavior you describe is reminiscent of what Word sometimes does when using Insert File and some other features when section breaks (mail merge, for example) are involved. There's a setting in Word that influences what type of section break will be generated, by default: Page Layout/Page Setup dialog launcher/Layout, Section Start. If this is set to New Page, then Word will tend to create that kind of section break for any section break in considers "undefined". Changing this to "Continuous" will usually help.


    Cindy Meister, VSTO/Word MVP, my blog

    Friday, March 15, 2013 1:33 PM
  • Well, if you'd like to see the behavior, I included an example where the behavior is shown, but I won't hold it against you of course if you don't trust the file. :) It's just 3 docx files though, so no worries.
    Sorry I had to use a file host, but adding attachments isn't possible here.

    Anyway, it's indeed what you're saying. I am basically 'merging' files. I have 1 main document in which I'm inserting the content of another document via the OpenXML SDK using the altchunks method (altchunks are file content/file references). If you extract the .docx in the example you'll see 2 altchunk documents in the package.

    Now, those documents will be repeated multiple times and data binding using content controls will determine it's content. That's the complete setup. The example only contains the merging, not the binding.

    The problem is basically that the resulting file forces a new page after a continuous section break (Word shows continuous, but forces new page). Somehow (for now) it's just for the first one, the rest is rendered just fine.

    If you then check out the 2 containing files in the docx archive you can literally see the 2 files with their continuous breaks rendering just fine. So it's weird that the final document renders it incorrectly.

    Also, the markup of the line after the break is not a header, so no problem there. If it were that one's layout then the second time the content was repeated would have had the same issue.

    Currently the sections don't have seperate headers/footers defined, but they'll matter in the final document to determine numbering.

    Oh and by now I've tested it further with mor content and it only seems to get more and more messed up with those breaks in the subdocuments which shouldn't be the case I would expect.

    Is this just buggy rendering by Word then? Or am I expecting too much by expecting this to work the way it should?

    Imagery:

    Example

    Saturday, March 16, 2013 1:04 AM
  • Hi Kevin

    I have no software that can open a *.rar file...

    From my mentioned experience with altChunk I've come to the conclusion that altChunk is only reliable for "simple" things.

    The problem isn't with the Open XML SDK, it's with how Word renders altChunks it finds in the docx file. I'm not sure whether the Word team would use the term "buggy". I get the impression that they simply didn't envision people would actually try to leverage altChunk in this manner. For us, looking for optimal solutions, it's a "no-brainer" that we'd want to use this and would expect it to be able to handle altChunk the same as when we'd use Insert File. But if you're so close to something you sometimes don't see all the ramifications...

    If you need this to work reliably, then I'm afraid you'll have to bite the bullet and merge the files the hard way, directly into the Open XML, rather than letting Word do the work for you.


    Cindy Meister, VSTO/Word MVP, my blog

    Sunday, March 17, 2013 6:51 AM
  • Tnx Cindy.

    Although I'm afraid that'll be pretty hard since I require a template approach where altchunks suited our needs with data binding to content controls afterwards. It's probably gonna be pretty difficult to achieve that level of templating w/o the altchunks so I'm afraid I'll have to see about making it work w/o the section breaks.

    After all ... I doubt I'll be able to just get the content from an existing template document and insert it as the full XML in the container document? Or is this simpler than I imagine?

    I assume I'd need to get all the content beneath the 'body' tag and insert it into the container document somehow, but will that markup match?

    It's unfortunate this doesn't work with altchunks though.

    Monday, March 18, 2013 9:35 AM
  • Hi Kevin

    I was actually thinking about a more piecemeal approach for your problem, where you pick up the information section-by-section, creating the section breaks "manually" in the target document.

    I'm not in full "Open XML" mode at the moment, where I can quote the object model off the top of my head (too much other stuff going on), but:

    You open the source file as an Open XML document. Get a IEnumerable collection of the Paragraphs. Loop the collection, as you go, you should be able to append the Paragraph.OuterXML to the XML of the target document (or create a Paragraph object in the document and assign the Paragraph.InnerXML to it - there are lots of variatons for transferring the XML as a block).

    Another possibility that might work, and would be faster if it does, would be to simply append the Body.InnerXML of the source to the target.

    Of course, in either case, you need to watch out for and handle any references to content contained in other files, such as pictures, headers, footers, pictures in headers/footers, etc.


    Cindy Meister, VSTO/Word MVP, my blog

    Monday, March 18, 2013 10:21 AM
  • Yeah, good suggestion.

    I've checked it out a bit already and it seems that's also a valid approach to merging the documents by simply cloning the whole body of the databound 'sub' document which I'm now using as altchunks into the new document.

    I'll see where I get with this once I get back into the project sometime this week (if I'm lucky, it's been very busy) :)

    Tnx for the assist

    Monday, March 18, 2013 7:13 PM