Convert word file to web page RRS feed

  • Question

  • I have ms-word documents with some images included. I need to convert that to a Html page

    any good idea? I can parse that doc for text, but how about that images?


    Thursday, April 1, 2010 4:06 PM

All replies

  • Hi arthur tu,

    Thankds for your question.

    To get the ImagePart of a document, you need to know the relationshipId which is referenced in the body of the document. This link is about how to insert a picture to a word document which will help you learn quickly about the file format. Once you get the relationshipId, you could call MainDocumentPart.GetPartById(string id) to get the ImagePart and then use ImagePart.GetStream() to get the stream. You could store the stream as an image file whose type is defined by the ContentType and then link the image in your HTML page.

    Hope this helps. If you have any question, please let me know.



    • Edited by Lu Zhang Tuesday, April 13, 2010 8:14 AM
    Friday, April 2, 2010 6:29 AM
  • how can I get that relationshipId of those images in a word doc?

    and also a problem is some images i need to extract from the word doc, but some not

    Friday, April 2, 2010 3:42 PM
  • Hi arthur tu,

    You could use the Productivity Tool to open a document and see the file format. The relationshipId is stored as below:

    <a:blip r:embed="rId5" cstate="print">

    Which is the Embed property of class Blip, you could try Drawing.Descendants<Blip>() to get the element Blip and then use Blip.Embed to get the relationshipId.

    As to extracting some images, you could identify them by the Name property of the NonVisualDrawingProperties, which usually stores the name of the image file. The way to get it is similar to Embed.

    Hope this helps. If you have any question, please let me know.



    Tuesday, April 6, 2010 2:21 AM