locked
HtmlUtilities.ConvertToText CSS style converting issue RRS feed

  • Question

  • Hi,

      Converting HTML to text gives wrong result

    (HtmlUtilities.ConvertToText).

    <html><head><style
    type='text/css'>p { margin: 0;
    }</style></head><body><div style='font-family:
    arial,helvetica,sans-serif; font-size: 10pt; color:
    #000000'><style>p { margin: 0; }</style><div
    style=""font-family: arial,helvetica,sans-serif; font-size: 10pt; color:
    rgb(0, 0, 0);""><br><br>--
    <br><div><span></span>Regards,<br>Windows 8

    Dev<br><br><span></span><br></div></div></div></body></html>

    Gives : margin: 0;

    How to avoid Css style converting ? Is there any utility available in c# ?

    Thanks


    Dheeraj PK http://dheerajpk.wordpress.com/

    Thursday, October 24, 2013 3:23 PM

Answers

  • What I mean is the HtmlUtilities.ConvertToText Cannot recognize the <style> in the HTML <body>, the code works fine in web browser(maybe the browser ignore the error), but the code will not pass HTML DTD, the <style> should always be in <head>.

    --James


    <THE CONTENT IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, WHETHER EXPRESS OR IMPLIED>
    Thanks
    MSDN Community Support

    Please remember to "Mark as Answer" the responses that resolved your issue. It is a common way to recognize those who have helped you, and makes it easier for other visitors to find the resolution later.


    Saturday, October 26, 2013 12:31 AM
    Moderator
  • Thanks for your reply. I fixed by using regex to remove all style tags from html and passed to HtmlUtilities.

    Thanks.


    Dheeraj PK http://dheerajpk.wordpress.com/

    Friday, November 1, 2013 3:32 AM

All replies

  • Hi Dheeraj,

    I run your code and modified a bit, the result seems to be correct now:

    Let's look into your HTML code, I format your HTML and comment on your code:

    <html> <head> <style type='text/css'>p { margin: 0;}</style> </head> <body> <div style='font-family: arial,helvetica,sans-serif; font-size: 10pt; color: #000000'>

    //This place is where margin:0 appear, remove this line would be OK. HtmlUtilities can only recognize the correct HTML format <style>p { margin: 0; }</style> <div

    // double quotes are not allowed in the string, modify them to single ones style=""font-family: arial,helvetica,sans-serif; font-size: 10pt; color: rgb(0, 0, 0);""> <br><br>-- <br> <div> <span></span> Regards, <br>Windows 8 Dev<br> <br><span></span><br> </div> </div> </div> </body> </html>

    Best Regards,

    --James


    <THE CONTENT IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, WHETHER EXPRESS OR IMPLIED>
    Thanks
    MSDN Community Support

    Please remember to "Mark as Answer" the responses that resolved your issue. It is a common way to recognize those who have helped you, and makes it easier for other visitors to find the resolution later.


    Friday, October 25, 2013 6:49 AM
    Moderator
  • Thanks for your reply. HTML code is rendered in webbrowser without any issue.

    Dheeraj PK http://dheerajpk.wordpress.com/

    Friday, October 25, 2013 1:56 PM
  • What I mean is the HtmlUtilities.ConvertToText Cannot recognize the <style> in the HTML <body>, the code works fine in web browser(maybe the browser ignore the error), but the code will not pass HTML DTD, the <style> should always be in <head>.

    --James


    <THE CONTENT IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, WHETHER EXPRESS OR IMPLIED>
    Thanks
    MSDN Community Support

    Please remember to "Mark as Answer" the responses that resolved your issue. It is a common way to recognize those who have helped you, and makes it easier for other visitors to find the resolution later.


    Saturday, October 26, 2013 12:31 AM
    Moderator
  • Hi ,

       Thanks for your reply. But my html content is dynamic (retrieving from web server).Removing or replacing will be difficult, Could you suggest any workaround ?

    (Is it a known issue in windows 8 store app ?)

    Thanks


    Dheeraj PK http://dheerajpk.wordpress.com/

    Monday, October 28, 2013 12:40 PM
  • Hi Dheeraj,

    HtmlUtilities can only convert correctly for the HTML strings that can pass HTML DTD.

    Could you tell me where you wanna use these HTML text? And where you want to display them?

    WebView is a good option, the result would be same as IE10(win8) or IE11(win8.1)

    Or display them in the RichTextBlock, read this article: Displaying HTML Content in a RichTextBlock

    Best Regards,

    --James


    <THE CONTENT IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, WHETHER EXPRESS OR IMPLIED>
    Thanks
    MSDN Community Support

    Please remember to "Mark as Answer" the responses that resolved your issue. It is a common way to recognize those who have helped you, and makes it easier for other visitors to find the resolution later.

    Monday, October 28, 2013 1:55 PM
    Moderator
  • Actually i want to convert html to plain text and display to textblock in xaml page. (Showing like email content preview in listbox).

    Dheeraj PK http://dheerajpk.wordpress.com/

    Monday, October 28, 2013 4:41 PM
  • Hi Dheeraj

    I don't think there would be other in-build function to convert HTML to plain string other than HtmlUtilities.

    Take a look at FizzlerEx: http://fizzlerex.codeplex.com/. It support Winrt and based on HtmlAgilityPack .

    --James


    <THE CONTENT IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, WHETHER EXPRESS OR IMPLIED>
    Thanks
    MSDN Community Support

    Please remember to "Mark as Answer" the responses that resolved your issue. It is a common way to recognize those who have helped you, and makes it easier for other visitors to find the resolution later.

    Tuesday, October 29, 2013 3:03 AM
    Moderator
  • Thanks for your reply. I fixed by using regex to remove all style tags from html and passed to HtmlUtilities.

    Thanks.


    Dheeraj PK http://dheerajpk.wordpress.com/

    Friday, November 1, 2013 3:32 AM