Can we use HTML Agility Pack to get the HTML source of an MS Word document? RRS feed

  • Question

  • We know that we can save a Word document as a web page. We can then open that web page and can view its HTML by right clicking the web page and clicking on the ‘View Source..’.

    Can we programmatically access the HTML Source of a Word document using HTML Agility Pack and if so how? This was possible in Office 2000 VBA as explained here.

    I am creating a VSTO 2010 AddIn for Word 2007 using C#.


    • Edited by namwam Saturday, January 26, 2013 4:31 AM Added a reference
    Saturday, January 26, 2013 1:00 AM


  • if you want to access contents of word 2007 document without using automation, you can use open xml sdk for this. If you however are set for html agility pack i think you would have to use automation to open document and call SaveAs with html format and then open it using agility pack.
    Saturday, January 26, 2013 6:51 AM