none
How to parse HTML by HTMLDocument RRS feed

  • Question

  • For example, the HTML like:
    <html>
    <head>
    <link type="text/css" rel="Stylesheet" href="./sample.css"> originalAttribute="href" originalPath=""./sample.css">"
    </head>

    <body>
    <font id="myfont">this is a sample </font>
    </body>
    </html>


    It import a CSS file like:
    #myfont {
      color : red;
    }

    .myClass {
      color : blue;
    }


    If I use HTMLDocument, could I get the information like this:
    "#myfont" has selected "font"
    ".myClass" has not been used
    Sunday, July 13, 2008 10:41 AM

Answers

  • I think you may get a COM Interop wrapper for IHTMLBodyElement like objects in this way:

            private void Form1_Load(object sender, EventArgs e)
            {
                this.webBrowser1.Navigate("http://www.google.com");

                webBrowser1.Navigated += new WebBrowserNavigatedEventHandler(webBrowser1_Navigated);
            }

            void webBrowser1_Navigated(object sender, WebBrowserNavigatedEventArgs e)
            {
                IHTMLBodyElement body = webBrowser1.Document.Body.DomElement as IHTMLBodyElement;
            }

    And then you can get the information in the way same as how do you get the information in JavaScript code.

    • Marked as answer by Zhi-Xin Ye Friday, July 18, 2008 4:10 AM
    Monday, July 14, 2008 10:07 AM

All replies

  • The HtmlDocument is rather limited in what it can do.  Worse yet, it has no constructor so you could not use it outside the context of a Windows Forms WebBrowser control.
    Sunday, July 13, 2008 2:28 PM
  • BinaryCoder said:

    The HtmlDocument is rather limited in what it can do.  Worse yet, it has no constructor so you could not use it outside the context of a Windows Forms WebBrowser control.



    If I create a HTMLDocument instance from WebBrowser, then I could get the information what I want?
    Monday, July 14, 2008 2:47 AM
  • I think you may get a COM Interop wrapper for IHTMLBodyElement like objects in this way:

            private void Form1_Load(object sender, EventArgs e)
            {
                this.webBrowser1.Navigate("http://www.google.com");

                webBrowser1.Navigated += new WebBrowserNavigatedEventHandler(webBrowser1_Navigated);
            }

            void webBrowser1_Navigated(object sender, WebBrowserNavigatedEventArgs e)
            {
                IHTMLBodyElement body = webBrowser1.Document.Body.DomElement as IHTMLBodyElement;
            }

    And then you can get the information in the way same as how do you get the information in JavaScript code.

    • Marked as answer by Zhi-Xin Ye Friday, July 18, 2008 4:10 AM
    Monday, July 14, 2008 10:07 AM
  • Killmyday said:

    I think you may get a COM Interop wrapper for IHTMLBodyElement like objects in this way:

            private void Form1_Load(object sender, EventArgs e)
            {
                this.webBrowser1.Navigate("http://www.google.com/");

                webBrowser1.Navigated += new WebBrowserNavigatedEventHandler(webBrowser1_Navigated);
            }

            void webBrowser1_Navigated(object sender, WebBrowserNavigatedEventArgs e)
            {
                IHTMLBodyElement body = webBrowser1.Document.Body.DomElement as IHTMLBodyElement;
            }

    And then you can get the information in the way same as how do you get the information in JavaScript code.



    Sorry, I don't quite understand. Using an event, and then I could know which CSS rule has not been used?
    Monday, July 14, 2008 1:18 PM
  • I don't think there is a way for IE to trigge a event whenever an CSS rule has been changed or applied for specific element, but I think you may have to traversal the DOM tree manually and find whether a given IHTMLElement has CSS rule set. :(
    Tuesday, July 15, 2008 10:58 AM