locked
How to extract hyperlinks from HTML? RRS feed

  • Question

  • Hello,

    When I use vb net to download the string of a web page HTML, the string comes missing some hyperlinks. When I use the web browser to view the HTML of the page (via "view source"), I see that some links are underlined, and you have to click them to see the link they contain. The link does not show in the HTML string unless you click on the underlined link within the HTML that contains it.

    I wonder how to extract those links from the HTML using vb net. They are hyperlinks inside the HTML document itself.


    Saturday, June 16, 2012 11:41 AM

Answers

  • It looks like you could use this HTML Agility Pack example as the basis for a solution. Or are the links generated by Javascript when the page is loaded?

    HTH,

    Andrew


    Saturday, June 16, 2012 2:59 PM
  • Is this what you are talking about? I quoted it from the link that follows it;

    "Also, you can get only some hyperlinks you expect according to your requirement like this:

            Dim theElementCollection As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("a")

            For Each curElement As HtmlElement In theElementCollection

                If curElement.GetAttribute("innerText").Contains("LinkTextKeywords") Then

                    ListBox1.Items.Add(Element.GetAttribute("href"))

                End If

            Next"

    http://social.msdn.microsoft.com/Forums/en-US/vbgeneral/thread/f0cf865d-f4ed-491f-b3db-6e295e3b9f14


    You've taught me everything I know but not everything you know.

    • Marked as answer by Mark Liu-lxf Tuesday, June 26, 2012 7:46 AM
    Saturday, June 16, 2012 3:49 PM

All replies

  • It looks like you could use this HTML Agility Pack example as the basis for a solution. Or are the links generated by Javascript when the page is loaded?

    HTH,

    Andrew


    Saturday, June 16, 2012 2:59 PM
  • Is this what you are talking about? I quoted it from the link that follows it;

    "Also, you can get only some hyperlinks you expect according to your requirement like this:

            Dim theElementCollection As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("a")

            For Each curElement As HtmlElement In theElementCollection

                If curElement.GetAttribute("innerText").Contains("LinkTextKeywords") Then

                    ListBox1.Items.Add(Element.GetAttribute("href"))

                End If

            Next"

    http://social.msdn.microsoft.com/Forums/en-US/vbgeneral/thread/f0cf865d-f4ed-491f-b3db-6e295e3b9f14


    You've taught me everything I know but not everything you know.

    • Marked as answer by Mark Liu-lxf Tuesday, June 26, 2012 7:46 AM
    Saturday, June 16, 2012 3:49 PM