Visual Basic > Visual Basic Forums > Visual Basic General > Visual Basic 2008 Extracting Div Tags, extracting Title Tags, Extracting Keyword Tags, Parsing Div Tags,etc..
Ask a questionAsk a question
 

AnswerVisual Basic 2008 Extracting Div Tags, extracting Title Tags, Extracting Keyword Tags, Parsing Div Tags,etc..

  • Saturday, November 07, 2009 10:02 PMVBNETMAN Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi Friends!,

    I was just wondering how to extract or parse any particual tags (whichever I specify) from webpages. I know how to extract text and links from webpages, but I tried to use the same method from the following code for div tags, title tags etcetera and it doesn't seem to work:

    Dim theElementCollection As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("a")
    
            For Each curElement As HtmlElement In theElementCollection
    
                If curElement.GetAttribute("href").Contains("http://twitter.com/") Then
    
                    TextBox2.Text += curElement.GetAttribute("innerText") & vbCrLf
    
                End If
    
            Next
    
    
    Now I know that the above code is for text of a link, but how could I implement meta tags, title tags, keyword tags etc.? I have tried everything!

    example I tried to extract the TD Class Innertext using the following code and nothing happens after the button is clicked:

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    
            Dim theElementCollection As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("TD")
    
            For Each curElement As HtmlElement In theElementCollection
    
                If curElement.GetAttribute("class").Equals("c0") Then
    
                    TextBox1.Text += curElement.GetAttribute("innerText") & vbCrLf
    
                End If
    
            Next
    
        End Sub
    
    


    Could someone give some examples for a few different tags (preferably Title Tags, Keyword Tags & TD Class)?

    Any would be great.

    Thanks!

Answers

  • Tuesday, November 10, 2009 8:34 AMCor LigthertMVPUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
    VBNetman

    In addition to Martin,

    For to do what you want: the best solution is using MSHTML, which is in the full versions of Visual Studio.

    Be aware that it is not in the Express version, so you have to download than that DLL, but I forgot the address from that. I assume that searching then for that is something you can do yourself.

    Be aware MSHTML has endless references so puting an import to it makes your code editor terrible slow because of the intelicense.




    Success
    Cor
  • Tuesday, November 10, 2009 4:13 AMMartin Xie - MSFTMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer

    Hi VBNETMAN,

    Nice to see you here:)

    Generally we can locate webpage elements based on their attributes in WebBrowser.Document and then automate them (e.g. retrieve page text, click button or hyperlink, etc.).
    However, the Class attribute in such elements as <span>, <div> seems not be recognised by WebBrowser.
    Please check this thread for reference:
    http://social.msdn.microsoft.com/Forums/en/vbgeneral/thread/cc079961-57c8-441c-9529-a5f9fb1f6901


    Additionally, you can directly get the title of the document currently displayed in the WebBrowser object like this:
         Dim title As String = WebBrowser1.DocumentTitle.ToString


    Best regards,
    Martin Xie


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.

All Replies

  • Tuesday, November 10, 2009 4:13 AMMartin Xie - MSFTMSFT, ModeratorUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer

    Hi VBNETMAN,

    Nice to see you here:)

    Generally we can locate webpage elements based on their attributes in WebBrowser.Document and then automate them (e.g. retrieve page text, click button or hyperlink, etc.).
    However, the Class attribute in such elements as <span>, <div> seems not be recognised by WebBrowser.
    Please check this thread for reference:
    http://social.msdn.microsoft.com/Forums/en/vbgeneral/thread/cc079961-57c8-441c-9529-a5f9fb1f6901


    Additionally, you can directly get the title of the document currently displayed in the WebBrowser object like this:
         Dim title As String = WebBrowser1.DocumentTitle.ToString


    Best regards,
    Martin Xie


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.
  • Tuesday, November 10, 2009 8:34 AMCor LigthertMVPUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
    VBNetman

    In addition to Martin,

    For to do what you want: the best solution is using MSHTML, which is in the full versions of Visual Studio.

    Be aware that it is not in the Express version, so you have to download than that DLL, but I forgot the address from that. I assume that searching then for that is something you can do yourself.

    Be aware MSHTML has endless references so puting an import to it makes your code editor terrible slow because of the intelicense.




    Success
    Cor