none
How to covert word tables to clean HTML format using C# RRS feed

  • Question

  • Hi all

    I have a word file with few tables in it, when I convert word to filtered HTML using C#, word add extra tags in it for tables, How do I remove them ?

    I just want text-align, bold, italic, underline, cell width, cell border tags in html

    How can I remove unnecessary tags   ?

    Thanks in advance for help

    Monday, June 2, 2014 7:10 AM

Answers

  • Hi koolprasad

    Expanding on Eugene's reply: Word provides only limited options on how to format the HTML it exports. The main reason for a converter to save to HTML format is to provide a "round-trip" capability for viewing/editing Word documents in a browser, then being able to edit them again in Word without losing information the browser doesn't support. That's the reason for the "extra tags".

    That said, it is possible to strip some of the excess away by saving as "filtered HTML", using the Enum memberr wdFormatFilteredHTML for the second parameter of SaveAs (or SaveAs2).


    Cindy Meister, VSTO/Word MVP, my blog

    Monday, June 2, 2014 2:31 PM
    Moderator

All replies

  • Hello,

    > How can I remove unnecessary tags   ?

    You can open the just generated HTML file and remove unnecessary tags. Does it make any sense?

    Monday, June 2, 2014 9:11 AM
  • Thanks for reply. But I want to do it Programmatically using C#. How to do that ? 
    Monday, June 2, 2014 12:40 PM
  • The Word object model doesn't provide anything for this. I'd recommend asking such questions in the Visual C# forum instead. There you will get the most qualified feedback.
    Monday, June 2, 2014 1:14 PM
  • Hi koolprasad

    Expanding on Eugene's reply: Word provides only limited options on how to format the HTML it exports. The main reason for a converter to save to HTML format is to provide a "round-trip" capability for viewing/editing Word documents in a browser, then being able to edit them again in Word without losing information the browser doesn't support. That's the reason for the "extra tags".

    That said, it is possible to strip some of the excess away by saving as "filtered HTML", using the Enum memberr wdFormatFilteredHTML for the second parameter of SaveAs (or SaveAs2).


    Cindy Meister, VSTO/Word MVP, my blog

    Monday, June 2, 2014 2:31 PM
    Moderator