how to convert word to XML on server side automatically? RRS feed

  • Question


    Here is my problem: My organization wants to upload word documents from users to the server. On the server side, the word dcoument (enforced with styles) needs to be converted to XML format files. Next, I need to use php to parse the open xml formats files and put the content into the database. Does anyone know how to convert word to XML on server side automatically?Is there any API or sample codes for php to parse Open XML Formats? Your suggestions are appreciated.

    Thursday, July 26, 2007 9:58 PM

All replies

  • Hi,


    what do you mean? Are your documents in the old binary format? If so, why? You can have the users save the document in the XML format and you will need no conversion. Microsoft Office supports this from 2000 onwards.



    Friday, July 27, 2007 6:22 PM
  • Hi, thanks for your reply. It is just regular .doc document. However, we don't want users to save the document into XML format. We want to convert the .doc file to XML format on the web server side after users submit their .doc file. Is there any way to programmatically convert  the .doc file to XML format on the web server side?


    Friday, July 27, 2007 8:38 PM
  • Did you come up with a solution you yet?  I'm looking to do the same kind of thing.


    Friday, August 24, 2007 6:17 PM
  • If the Web Server can host the Microsoft Word process (Winword.exe) you can - using only one instance of it - to open and save the document from .doc to .docx or .xml efficiently.

    All you need is to monitor the folder/s, open, save (& perhaps delete useless files).

    Open and parse the binary seems insane to me.


    Saturday, August 25, 2007 5:37 PM
  • Funny one of my Comapny's Clients also has the same problem - there must be an epidemic - but I was directed to Doug Mahugh's blog posting

    wherein he decribes how to use the OFC.exe command line utility to convert from doc to docx format.


    What I intend doing is writing a windows service to wait on delivery of any doc files - intercept them - run the converter  - use a small utilty to extract the document.xml file out as text using the new SDK ( a technique I gleamed from perusing Wouter's Blog and his Great Book). Then I'll be able to do anything I like with the resultant XML - I know nothing about PHP so I'd love to chat about that - in my case I'm going to be using the XLinq capability of VB9.




    Tuesday, September 4, 2007 7:48 PM
  • Great!

    OFC would be an option, and a windows service built with .Net should pack it all. Smile

    Monday, September 10, 2007 1:18 AM
  • I have some old .php files that a lady used to print some signs/banners for an annual advertisement project we do for a beauty contest.  She no longer has the Home Publishing program and we need to open the files in something else.  What can we do to open these .php files to save them in another format??


    Thursday, October 11, 2007 12:06 PM
  • If I don't misunderstand you are in the plain text to Open XML scenario. Is it?

    If so I think you should post a new thread but - anyway - tell me if I'm not wrong.

    Some years ago Don Box (not sure) wrote an XMLTextReader...


    Thursday, October 11, 2007 1:02 PM