locked
Convert HTML to Word Document

    Question

  •  Hi! I wish to convert a HTML file to Word Document file. Is there anyway to save it as (*.doc) using Microsoft.Office.Interop library? Or is there any alternative method beside changing the file extension from (*.html) to (*.doc) because I wish to do it programmatically in C#. Thanks.
    Tuesday, September 02, 2008 1:00 AM

Answers

  •  Hi! I did found a solution from somewhere else. You can try to open HTML file in word using Microsoft.Office.Interop.Word and save it in *.doc file type. Here are the sample codes that I used:-

    object filename1 = filename;
    object oMissing = System.Reflection.Missing.Value;
    object readOnly = false;
    object oFalse = false;

    Microsoft.Office.Interop.Word.Application oWord = new Microsoft.Office.Interop.Word.Application();
    Microsoft.Office.Interop.Word.
    Document oDoc = new Microsoft.Office.Interop.Word.Document();
    oDoc = oWord.Documents.Add(
    ref oMissing, ref oMissing, ref oMissing, ref oMissing);
    oWord.Visible =
    false;

    oDoc = oWord.Documents.Open(ref filename1, ref oMissing, ref readOnly, ref oMissing, ref oMissing, ref oMissing,ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,ref oMissing, ref oMissing);

    filename1 = @"D:\FileConverter\Temp\new.doc";
    object fileFormat = Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatDocument;
    oDoc.SaveAs(
    ref filename1, ref fileFormat, ref oMissing, ref oMissing,ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,ref oMissing, ref oMissing, ref oMissing);

    oDoc.Close(ref oFalse, ref oMissing, ref oMissing);
    oWord.Quit(
    ref oMissing, ref oMissing, ref oMissing);

    Regards,
    Yuen Li

    • Marked as answer by Yuen Li Tuesday, September 23, 2008 12:22 AM
    Tuesday, September 23, 2008 12:22 AM

All replies

  • Hi,  I am new here to this website.   There is a lot of great information on here.  

    I was wondering if you ever found a way to do this, perhaps through another avenue.  I am curious about doing this myself.

    Thank you for your help.

    Best wishes to you.

    Bea
    Monday, September 22, 2008 10:33 PM
  •  Hi! I did found a solution from somewhere else. You can try to open HTML file in word using Microsoft.Office.Interop.Word and save it in *.doc file type. Here are the sample codes that I used:-

    object filename1 = filename;
    object oMissing = System.Reflection.Missing.Value;
    object readOnly = false;
    object oFalse = false;

    Microsoft.Office.Interop.Word.Application oWord = new Microsoft.Office.Interop.Word.Application();
    Microsoft.Office.Interop.Word.
    Document oDoc = new Microsoft.Office.Interop.Word.Document();
    oDoc = oWord.Documents.Add(
    ref oMissing, ref oMissing, ref oMissing, ref oMissing);
    oWord.Visible =
    false;

    oDoc = oWord.Documents.Open(ref filename1, ref oMissing, ref readOnly, ref oMissing, ref oMissing, ref oMissing,ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,ref oMissing, ref oMissing);

    filename1 = @"D:\FileConverter\Temp\new.doc";
    object fileFormat = Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatDocument;
    oDoc.SaveAs(
    ref filename1, ref fileFormat, ref oMissing, ref oMissing,ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,ref oMissing, ref oMissing, ref oMissing);

    oDoc.Close(ref oFalse, ref oMissing, ref oMissing);
    oWord.Quit(
    ref oMissing, ref oMissing, ref oMissing);

    Regards,
    Yuen Li

    • Marked as answer by Yuen Li Tuesday, September 23, 2008 12:22 AM
    Tuesday, September 23, 2008 12:22 AM
  • Yikes!! This looks scarey.  i'll have to try it.

    Thanks! 

    Bea
    Thanks! Bea
    Friday, September 26, 2008 2:25 AM
  • Hello,

           I have done that much task as well as applying some of the formatting to document object also as like to create header,footer,border,

    document created successfully,but now my problem is my content is in html format now i want to display that content as a html formatted

    on my created word document,for that i have use

    object fileFormat = Microsoft.Office.Interop.Word. WdSaveFormat .wdFormatFilteredHTML;

    then i can get proper document,but at that time document cound't affect with applied formatting means it doesn't disply header

    and footer, while i have change above syntax as like below

    object fileFormat = Microsoft.Office.Interop.Word. WdSaveFormat .wdFormatDocument;

    then header and footer wil display but agin i m not getting html formated text on word document.

    now my problem is vise versa if i want to solve one problem then its create second one ....please help me to short out problem

    Thanks,

    Kaushal Pathak.
    Tuesday, May 26, 2009 7:10 AM
  • Hi,

    I also attempted this secnario but ran into alot of problems that you have mentioned. In the end I created the document using a combination of openXML 2.0 and AltChunks to insert the formatted HTML (see code snippet below).

    The outputted file is Word 2007 so you will then need to use Microsoft.Office.Interop.Word to save the document as a 2003 doc.

    Regards,
    Luke

    string html = @"<html><head/><body>Hello word!!<body/></html>";
    string altChunkId = "AltChunkId1";

                            MainDocumentPart mainPart = document.MainDocumentPart;

                            AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(

                                "application/xhtml+xml", altChunkId);

                            using (Stream chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write))

                            using (StreamWriter stringStream = new StreamWriter(chunkStream))

                                stringStream.Write(html);
     XElement FieldContentControl = document
                                    .MainDocumentPart
                                    .GetXDocument()
                                    .Root
                                    .Element(W.body)
                                    .Elements(W.sdt)
                                    .Where(sdt =>
                                        fieldName == (string)sdt
                                            .Element(W.sdtPr)
                                            .Element(W.tag)
                                            .Attribute(W.val))
                                    .FirstOrDefault();

     FieldContentControl.ReplaceWith(altChunk);

    document.MainDocumentPart.PutXDocument();
    Thursday, February 11, 2010 5:37 PM