none
Generating simplefields with conversion doc > docx RRS feed

  • Question

  • I hope someone of you can give me some insight in what happens internally with conversion of doc > docx. 

    I, in my simple thoughts, would expect that when you have a mergefield in a doc, it would generate a simplefield in the docx, when saving this doc as docx.

    I want to also refer to my other post, it seemed resolved but unfortunately it isnt

    http://social.msdn.microsoft.com/Forums/en-US/oxmlsdk/thread/1f62a673-827e-414b-9975-fdb6e4dd23ef/

    i have written a "conversion appl" that opens doc files and saves them as docx files. 

    Since just opening and saving doesnt suffice to get the expected simplefields for each mergefield in de doc, i had to do the following in the code

    1.open doc

    2.turn off rsid

    3.save doc as docx under different name

    4.open docx which was just saved

    5.make sure the rsid is still off

    6.save docx as docx under the original name

    for some documents (.doc) it will convert all the mergefields correctly into simplefields

    But, yes there is the but, not all.

    Why not??? when i have a doc with just a mergefield (no text) goes wrong

    when i have a doc with "some text : <mergefield>" goes wrong

    when i have a doc with "some text: and then on the next line "<mergefield>" goes right

    So i thought, maybe it has to do with the first line, so in the conversion app i add some text to the top of the page (so that mergefields will never be on the first line anymore) and then saving it as docx, without succes...

    Please let there be someone out there that can give me some insight on what exactly decides whether a simplefield is created or not

    If anything in the above is not clear, dont hesitate to ask

    Albert



    AlbertRib

    Thursday, March 28, 2013 1:41 PM

All replies

  • Hi Albert,

    Thank you for posting in the MSDN Forum.

    I'll consult my colleague on your issue. You'll be informed if there's any update.

    Thank you for your patience and understanding.

    Best regards,


    Quist Zhang [MSFT]
    MSDN Community Support | Feedback to us
    Develop and promote your apps in Windows Store
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Friday, March 29, 2013 11:32 AM
    Moderator
  • Hi Albert,

    I followed your steps and was able to see a difference in the generated code with the converted doc versus the document created in 2010.

    I did find that when I closed the convert doc then reopened it added and deleted a paragraph then saved again the code looks the same when I compared the files using the OOXML tool.

    I read through your discussion with Cindy in the other post and I'm not certain why there is this difference.

    Please explain how this impacts your application.  Either document will connect to a datasource and populate the mergefield. I'm doing this in the UI and I would think programmatically it would work the same.

    Thanks,

    Harold


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Harold Kless Microsoft Online Community Support


    Friday, March 29, 2013 9:50 PM
  • Quist, Harold,

    first of all thanks for taking the time to look into my issue.

    Harold, it impacts my application in such way, that without the old MERGEFIELDS in the .doc, being simplefields in the .docx there is no application...

    Our company has so many problems using Word as the one doing the merging. (esp on Terminal Server) that i decided to find out if i can do the merging myself, without Word. 

    To accomplish this, i had to make myself familiar with the new (xml) structure of the documents and see if i have enough information/control over the document.

    So i built a document with mergefields in docx and began "reading" the structure. Here it is where i came across the simplefields and i was happy. So i built my app, and found very satisfying results as a first test. To give you an idea, to merge a letter in the old way (init word, open template, build datasource, copy template to temp location, merge this one using word, do some other stuff, copy merged file back to designated location by user) on average takes about 20-30 seconds. My application can merge up to 4000 letters per minute!

    Our costumers (over 300) all have different self-built documents which they use for merging. Those documents are all .doc. As you can understand, we are not going to tell them to rebuilt all their documents into docx. So i had to write another application, that converts doc into docx. And this is where this thread came to live.

    It really frustrates me that,  not even to talk about the million settings in Word that might or might not change the endresult, i cannot depend on the internal conversion of Word. It seems that every doc i convert has a different endresult in the docx.

    I do have the feeling that there is nobody that truly understands what the hell is happening "under the hood" so i started adjusting my conversion app. I will replace the mergefields by flat text and save this as a docx. Then i will see if i can "reverse merge" the docx with the flat text into a docx with simplefields. 

    If there is anyone out there that knows a bit more about the internal Word conversion, pls shed some light!

    Thanks ia,

    Albert



    AlbertRib

    Saturday, March 30, 2013 9:31 AM
  • Hi Albert

    <<I do have the feeling that there is nobody that truly understands what the hell is happening "under the hood" so i started adjusting my conversion app. I will replace the mergefields by flat text and save this as a docx. Then i will see if i can "reverse merge" the docx with the flat text into a docx with simplefields.>>

    Word is extremely complex and certain things are simply unpredictable (except, perhaps for the person who wrote the code that's giving the results, who may not even be on the Word team anymore). So it makes sense that you "handle" the files converted to *.docx.

    But may I suggest that perhaps you should convert files away from merge fields to use something else less "volatile"? There's no telling when Word might decide to change the way the merge field is structured if a document is later opened and saved again in Word.

    The ContentControls were introduced in Word 2007 for just this purpose: "merging" (and "mining") data. They're much more predictable than merge fields or bookmarks. They can be linked to a CustomXMLPart, making reading/writing the data much simpler (as you're dealing with a single, dedicated XML file).


    Cindy Meister, VSTO/Word MVP, my blog

    Monday, April 1, 2013 4:07 PM
    Moderator
  • Cindy, 

    I am not familiar with ContentControls, so i will do some research on those. However i think it will be just another workaround, right?

    The issue here being the internal conversion of Word that is unpredictable (not only this, even creating documents in W2010 with mergefields gives different outcomes, but lets not go into this right now)

    If i interpret it right, i would, instead of rebuilding the doc > docx using simplefields, i would rebuild doc > docx using ContentControls?

    Maybe im way off, and should have read into the ContentControls-matter firstly, but this is what popped to mind immediately

    "Word is extremely complex and certain things are simply unpredictable (except, perhaps for the person who wrote the code that's giving the results, who may not even be on the Word team anymore). "

    Word being extremely complex > agreed. Things being unpredictable, unacceptable in my book. Especially in programming, things should always be predictable. Its just 0's and 1's right?

    Going to have a look at ContentControls. Thanks again for your thoughts!

    Albert


    AlbertRib

    Tuesday, April 2, 2013 6:21 AM
  • Hi Albert

    <<If i interpret it right, i would, instead of rebuilding the doc > docx using simplefields, i would rebuild doc > docx using ContentControls?>>

    Correct.

    <<Things being unpredictable, unacceptable in my book. Especially in programming, things should always be predictable. Its just 0's and 1's right?>>

    Under the covers, yes. But you have to keep in mind that Word is doing a lot of things in the UI (and when you automate it) to ensure things are "intuitive" for the user, to manage layout and a number of other things we're hardly aware of under normal circumstances. These things affect what's written to the Open XML file, as the core Word code branches according to internal rules which have evolved over 25 years of features being added on to the original application. In the end, the developers are concerned that it all "works" in the UI, but not in providing consistency for people consuming the underlying file format (which has changed TWICE since the core code was written, prior to 1990)...


    Cindy Meister, VSTO/Word MVP, my blog

    Tuesday, April 2, 2013 10:26 AM
    Moderator
  • Thanks Cindy,

    i will look into the ContentControls

    " In the end, the developers are concerned that it all "works" in the UI, but not in providing consistency for people consuming the underlying file format"

    brrrrr, this gives me the chills.. 

    Albert


    AlbertRib

    Tuesday, April 2, 2013 10:51 AM
  • <<" In the end, the developers are concerned that it all "works" in the UI, but not in providing consistency for people consuming the underlying file format"

    brrrrr, this gives me the chills.. >>

    Mmmm. Historically, no one was supposed to be able to work with the files outside the Word application, so it wasn't really an issue (goal). The idea that the file formats should be just as - or more - accessible to developers than what goes on in the application is "foreign".

    And we're all familiar with the problems of "scope": resources (time/money) is limited and you have to get the product out the door. Word has always been, and still is, an application targeted at the end-user; the developer an after-thought. The primary customer target isn't going to change, but the attention paid to the file format will need more focus. The Word team is beginning to realize this, as more and more people start to work with the files, instead of using the APIs.

    But for the present, we have to work a bit harder to achieve what we'd like to!


    Cindy Meister, VSTO/Word MVP, my blog

    Tuesday, April 2, 2013 2:37 PM
    Moderator