none
Microsoft.Office.Interop.Word Convert HTML to Docx-embedding images into docx during conversion

    Question

  • Hi all,

        I am trying to convert HTML to Docx by using Microsoft.Office.Interop.Word. It works fine for the plain text html. If the html file has images,  after the conversion, docx only shows the text NOT the images. All the images has error("x") and seem to  hyperlink to images on my hard disk.  Is there anyway to "embed" the images in the docx instead of hyperlinking?

    Here is the code snapet:

     word = new Microsoft.Office.Interop.Word.Application();
                    wordDoc = new Microsoft.Office.Interop.Word.Document();
                    wordDoc = word.Documents.Add(ref oMissing, ref oMissing, ref oMissing, ref oMissing);
                    word.Visible = false;

                    wordDoc = word.Documents.Open(ref filepath, ref confirmconversion, ref readOnly, ref oMissing,
                                                  ref oMissing, ref oMissing, ref oMissing, ref oMissing,
                                                  ref oMissing, ref oMissing, ref oMissing, ref oMissing,
                                                  ref oMissing, ref oMissing, ref oMissing, ref oMissing);
                    object fileFormat = saveformat;
                    wordDoc.SaveAs(ref saveto, ref fileFormat, ref oMissing, ref oMissing, ref oMissing,
                                   ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
                                   ref oMissing, ref oMissing, ref oMissing, ref oallowsubstitution, ref oMissing,
                                   ref oMissing);

    Thanks

    • Moved by Kevin Pan Wednesday, July 14, 2010 8:39 AM (From:Windows Presentation Foundation (WPF))
    • Moved by Bessie Zhao Thursday, July 15, 2010 1:54 AM not related to be VSTO technology. (From:Visual Studio Tools for Office)
    Monday, July 12, 2010 11:31 PM

Answers

  • There are some misunderstandings here. I have a html file along with some images storing on my local hard disk. I am trying to convert this html to docx file. After I call doc.SaveAs(...), i get the docx file but the images in the docx are "linking" to my local images, which means that the images are not being saved with the docx but just linking them. So if i delete my local images and open the docx again, the docx will show "x" on all image spots.

    So I ended up looping through worddoc.InlineShapes and find out all the wdInlineShapeLinkedPicture type shapes and then re-insert the shapes/images with type wdInlineShapePicture back to the same spot.

    • Marked as answer by Nestlie Tuesday, July 20, 2010 5:44 PM
    Friday, July 16, 2010 11:04 PM
  • Hello,

    Thanks for the clarification. Now I know that the image in the html file I used is not a linking one. So when I save this html file as a docx file, I could see the image.

    For the issue you are seeing, via Word Object Model, there is no a direct way to make these picture of which type is wdInlineShapeLinkedPicture to other type. As far as I see, the only way is to insert it again as you side.

    If you have any concern for this post, please feel free to follow up.

    Best regards,
    Bessie


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    • Marked as answer by Nestlie Tuesday, July 20, 2010 5:44 PM
    Tuesday, July 20, 2010 2:14 PM

All replies

  • After some further investigation, I found out there are mainly two types of InlineShapes, one is wdInlineShapeLinkedPicture(looks like hyperlink image?) and the other one is wdInlineShapepicture(looks like embedded image in docx?). How can I convert wdInlineShapeLinkedPicture to wdInlineShapepicture?
    • Proposed as answer by Gelu Vac Friday, July 01, 2011 7:51 PM
    Tuesday, July 13, 2010 1:30 AM
  • Hello,

    How do you check the types of these images? To save a html file as a .docx file, you need to explicitly point out the SaveFormat. In my side, I also have made a simple test for this scenario. First, I save a web page as .htm file. Then use code as below,

                Word.Application wordApp = new Word.Application();
                wordApp.Visible = true;
                Word.Document doc = wordApp.Documents.Open(@"C:\Test\1.htm");
                doc.SaveAs2(@"C:\Test\1.docx", Word.WdSaveFormat.wdFormatDocumentDefault);

    Finally, I could see the image in this new document. If you have any concern for this, please feel free to follow up.

    Best regards,
    Bessie

     


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Thursday, July 15, 2010 6:30 AM
  • There are some misunderstandings here. I have a html file along with some images storing on my local hard disk. I am trying to convert this html to docx file. After I call doc.SaveAs(...), i get the docx file but the images in the docx are "linking" to my local images, which means that the images are not being saved with the docx but just linking them. So if i delete my local images and open the docx again, the docx will show "x" on all image spots.

    So I ended up looping through worddoc.InlineShapes and find out all the wdInlineShapeLinkedPicture type shapes and then re-insert the shapes/images with type wdInlineShapePicture back to the same spot.

    • Marked as answer by Nestlie Tuesday, July 20, 2010 5:44 PM
    Friday, July 16, 2010 11:04 PM
  • Hello,

    Thanks for the clarification. Now I know that the image in the html file I used is not a linking one. So when I save this html file as a docx file, I could see the image.

    For the issue you are seeing, via Word Object Model, there is no a direct way to make these picture of which type is wdInlineShapeLinkedPicture to other type. As far as I see, the only way is to insert it again as you side.

    If you have any concern for this post, please feel free to follow up.

    Best regards,
    Bessie


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    • Marked as answer by Nestlie Tuesday, July 20, 2010 5:44 PM
    Tuesday, July 20, 2010 2:14 PM
  • When I saw this post I was so hoping that it would include a solution.  I ran into the same problem this week.  I'm generating Word files using XSLT, and the images in the files were created as links rather than an embedded image file.  Alas, no solution was given, so I had to figure it out myself.  Here's a solution that works for me.  It mirrors the manual process of "Prepare -> Edit Links to Files -> Save Picture in Document":

    VB.Net

    For i As Integer = 1 to wordDoc.InlineShapes.Count
        If Not wordDoc.InlineShapes(i).LinkFormat Is Nothing AndAlso wordDoc.InlineShapes(i).LinkFormat.SavePictureWithDocument = False Then
            wordDoc.InlineShapes(i).LinkFormat.SavePictureWithDocument = True
        End If
    Next

    Thursday, August 05, 2010 10:21 PM
  • Hi All,

    This is my first post. 

    I am developing a web application in .NET 1.1 (visual studio 2003). 

    I am using Microsoft.Interop.Word DLL in my web project. 

    I used this code: http://social.msdn.microsoft.com/Forums/en-SG/worddev/thread/587c1977-7850-4577-a74a-289461fcf61a

    -------

    I changed .doc to .RTF in the code as i wanted my file to be RTF.

    Here i am able to embed the images in the generated RTF file. But the main problem I am facing is:

    1. The html I am using has 3 HTML blocks like 

       <html>

    </html>

     

       <html>

    </html>

     

       <html>

    </html>

     

    So I am removing the HTMLs from middle part and putting everything in one HTML block-> rewriting an html file and then converting it.

    The RTF file generated has the complete body of html but the Paging is improper. i.e. The header which should come on 2nd page is coming on 1st page's bottom.

    Please help me with this. Is there any property by which we can set the pagebreak dynamically (while converting). 

    2. It is not reading the CSS file for the <table> tag in html file .. i.e. In RTF file the table comes with increased font, and left aligned.. which is improper.

     

    Please help. 

    Thanks in Advance...


    Regards Ak
    Saturday, April 16, 2011 4:25 PM
  • Indeed. the image object is a local reference.

    So all you need to do is make sure that image exists at the end of that link. Either it is in the root folder where you store your .html file or where you save the .docx file ...

    You can also try and save it in PDF format - the only difference is that you don't need to have that link to the image physical file anymore ... still, the html source file has to have it.

    I just finished an example that I can provide if you need it ...

    Cheers.

    Friday, July 01, 2011 7:55 PM
  • Hi All,

    New to this . I am having RTF field in which i uploaded the image and stored the Data base.

    when I am opening report with RTF field data in docx format , I am not able to see the image .

    can any one help me on this.

    Cheers


    • Edited by smt_p Tuesday, July 24, 2012 12:12 PM
    Tuesday, July 24, 2012 12:11 PM