none
Updating Table of Contents, Conversion to PDF RRS feed

  • Question

  • Hi,

    I am aware that this has been discussed under numerous other forums. I am also aware that Microsoft is pushing for using Word Automation Services as a solution for this issue. However I don't think that is appropriate. Let me explain our scenario.

    We have a document generation solution that runs on the server machines. The solution makes use of Open XML SDK and is therefore not relying on Microsoft Word. It can generate documents by reading in word templates and data in some XML file and appropriately placing the data within the word document, repeating where necessary. The output is a fully formatted document (formatting as per the template document, data read in from xml file).

    The solution runs fine except for two things:

      • TOC is NOT updated.
      • Client wants the output document to be readonly, preferably converted to a PDF. Since TOC is not updated, the read-only version has an outdated TOC.

    NOTE: Outputting read-only document means that we cannot go for a solution that updates TOC automatically when the user opens a document in Word (such as setting dirty flag as suggested on some other forum threads).

    TOC Update and Conversion to PDF are not part of Open XML SDK. Both of these things are part of Word Automation Services. However integrating Word Automation Services in our solution has several problems.

    1. It looks pretty odd to install SharePoint 2010 just so that we could update TOC and convert a word document to PDF. There is huge price tag associated with SP2010. Since our solution is installed (not hosted), this integration makes our product less likely to be accepted by customers because of hugely priced products included in system requirements.
    2. We could develop a Web Service that in turn exposes Word Automation Services from a hosted installation of SP2010 (e.g. hosted by us) so that client does not have to pay for full SP2010 costs. There are at least 3 issues with this approach:
    • We are not sure about the licensing implications of using this solution.
    • Word Automation Services relies in document queues i.e. the call is asynchronous in nature. This is going to be a poor solution for a desktop document generation utility (to keep polling a hosted solution for the      completion of a queued job).
    Our solution is installed on server machines that are pretty isolated. Even if we manage to get everything working, it is quite likely that clients will NOT allow any service calls from those machines to the outside world.

    One can use third party commercial tools for that but they do not seem to fulfil all scenarios of TOC update. We've tried quite a few but they seem to fail under complex scenarios (page numbers are incorrect, TOC is not updated at all, problems if document contains images etc. etc.).

    Can I therefore put a request here to add this functionality to Open XML SDK. I have read the explanation on some other threads that Open XML SDK does not include the rendering library/engine and is therefore not able to update page numbers in TOC etc. etc. However, I am sure that code is there somewhere in Microsoft Office. Can that not be extracted out and made part of Open XML SDK?

    Also, regarding update of page numbers, I am pretty sure that, if we choose to include page numbers in header/footer, the document generated by our solution does have these updated. I haven't gone through the code but this is pretty obvious that we are NOT using anything other than Open XML to generate document. If page numbers can be calculated correctly for header/footer, they can also be calculated for TOC I believe. If that is true, it is only a matter of identifying appropriate headings to build the TOC.

    Regards,

    Rashid Saleem


    Wednesday, November 21, 2012 10:15 AM

Answers

  • Hi Rashid

    << I've seen a few third party solution (doing Word to PDF conversion) doing the same... the calculation of page numbers is probably something that could easily be done outside Word >>

    As long as you know how a page layout program is going to interpret what you "feed" it in the way of characters, font face, font size, etc. it is possible to calculate the line and page lengths because those programs aren't going to "reflow" the document when it's opened. The information in those formats is static.

    The difference is that Word is a word processor, not a layout program. Word reflows documents constantly and bases its decisions for reflowing not only on font face, font size, etc. but also on how the screen and printer drivers work with what a document's content. So a Word document may lay out and print differently on one computer than on another. A PDF document will always be the same.

    You could probably make "best guess" estimates based on the font and paragraph settings where pages will break, but they'll never be 100% accurate. And that's probably why those third-party components fail in some complex scenarios.


    Cindy Meister, VSTO/Word MVP, my blog

    Tuesday, December 4, 2012 7:39 PM
    Moderator

All replies

  • Hi Rashid

    I agree and second the suggestion that it would be nice to have a "runtime tool" that could perform the tasks of updating documents generated using the Office Open XML file formats. Tying the functionality to SharePoint doesn't really make the full technology accessible...

    At the same time, however, I'd like to clear up one point you make:
    "Also, regarding update of page numbers, I am pretty sure that, if we choose to include page numbers in header/footer, the document generated by our solution does have these updated."

    The difference between the TOC and page numbers in a header/footer is that Word updates the latter automatically when  document is opened (and when it is printed) in Word, without any action on the part of the user. A TOC field has to be explicitly updated; the user could click in it and press F9, or use the option from the right-click menu. Another possibility would be a macro named AutoOpen in the document, but this would probably result in issues with security settings. More reasonable would probably be a COM Add-in.

    A COM Add-in could, for example, monitor the DocumentsOpen event, check for something in the document (Document Property, for example) and, if it's present, remove any document protection, update the fields then re-instate document protection.

    This is about the only solution I can think of at present if you use Word documents instead of some other file format. Even if Microsoft would revise its policy concerning rendering engines for Word and Excel, it's not going to happen soon enough for your solution, since it's not part of Office 2013. (Indeed, Office has been tied even more tightly to SharePoint in the newest version, not less.)


    Cindy Meister, VSTO/Word MVP, my blog

    Wednesday, November 21, 2012 2:37 PM
    Moderator
  • Hi Cindy,

    Sorry, I had to unmark your answer that was (probably) automatically marked as I wasn't able to get back to this thread lately.

    OK. I understand your point about Word updating page numbers in header/footer section automatically when the document is opened. I've seen a few third party solution (doing Word to PDF conversion) doing the same. The process goes like this:

    1. Generate Document using Our App (uses Open XML SDK) and Save on File System
    2. DO NOT Open the Document in Word. Call the third party component to load the document and then convert to PDF.
    3. Open the resulting PDF and you see the page numbers appearing in footer are all correct.

    But this is some third party tool so I don't think we can discuss much about that here. My point is that, if the third party tools doing Word to PDF conversion can do this (i.e. update page numbers in the footer section) and there are quite a few out there, the calculation of page numbers is probably something that could easily be done outside Word. If that functionality is extracted out as a .NET (or COM, if its easier) library, or better still, merged in Open XML SDK, the only remaining bit would be to identify the headings to fill the TOC list. I hope this isn't looking a far fetched idea. :). I think people won't mind even if this is released as a premium version of SDK with some reasonable price. I hope that would be much better than the SharePoint based solution.

    We can utilize a third party component for the same purpose but this means that the component has to be tested with a number of document generation scenarios, producing headings and content in different styles. What we've seen from our tests is that these components seem to fail on one or the other complex scenario and we just don't build enough confidence to release one with our app.

    Regards,

    Rashid Saleem


    RS

    Tuesday, December 4, 2012 10:28 AM
  • Hi Rashid

    << I've seen a few third party solution (doing Word to PDF conversion) doing the same... the calculation of page numbers is probably something that could easily be done outside Word >>

    As long as you know how a page layout program is going to interpret what you "feed" it in the way of characters, font face, font size, etc. it is possible to calculate the line and page lengths because those programs aren't going to "reflow" the document when it's opened. The information in those formats is static.

    The difference is that Word is a word processor, not a layout program. Word reflows documents constantly and bases its decisions for reflowing not only on font face, font size, etc. but also on how the screen and printer drivers work with what a document's content. So a Word document may lay out and print differently on one computer than on another. A PDF document will always be the same.

    You could probably make "best guess" estimates based on the font and paragraph settings where pages will break, but they'll never be 100% accurate. And that's probably why those third-party components fail in some complex scenarios.


    Cindy Meister, VSTO/Word MVP, my blog

    Tuesday, December 4, 2012 7:39 PM
    Moderator