Visual Basic 2008 Extracting Div Tags, extracting Title Tags, Extracting Keyword Tags, Parsing Div Tags,etc..
- Hi Friends!,
I was just wondering how to extract or parse any particual tags (whichever I specify) from webpages. I know how to extract text and links from webpages, but I tried to use the same method from the following code for div tags, title tags etcetera and it doesn't seem to work:
Now I know that the above code is for text of a link, but how could I implement meta tags, title tags, keyword tags etc.? I have tried everything!Dim theElementCollection As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("a") For Each curElement As HtmlElement In theElementCollection If curElement.GetAttribute("href").Contains("http://twitter.com/") Then TextBox2.Text += curElement.GetAttribute("innerText") & vbCrLf End If Next
example I tried to extract the TD Class Innertext using the following code and nothing happens after the button is clicked:
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click Dim theElementCollection As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("TD") For Each curElement As HtmlElement In theElementCollection If curElement.GetAttribute("class").Equals("c0") Then TextBox1.Text += curElement.GetAttribute("innerText") & vbCrLf End If Next End Sub
Could someone give some examples for a few different tags (preferably Title Tags, Keyword Tags & TD Class)?
Any would be great.
Thanks!- Edited byMartin Xie - MSFTMSFT, ModeratorTuesday, November 10, 2009 3:31 AMRefine the thread title.
Answers
- VBNetman
In addition to Martin,
For to do what you want: the best solution is using MSHTML, which is in the full versions of Visual Studio.
Be aware that it is not in the Express version, so you have to download than that DLL, but I forgot the address from that. I assume that searching then for that is something you can do yourself.
Be aware MSHTML has endless references so puting an import to it makes your code editor terrible slow because of the intelicense.
Success
Cor- Marked As Answer byMartin Xie - MSFTMSFT, ModeratorFriday, November 13, 2009 3:13 AM
Hi VBNETMAN,
Nice to see you here:)
Generally we can locate webpage elements based on their attributes in WebBrowser.Document and then automate them (e.g. retrieve page text, click button or hyperlink, etc.).
However, the Class attribute in such elements as <span>, <div> seems not be recognised by WebBrowser.
Please check this thread for reference:
http://social.msdn.microsoft.com/Forums/en/vbgeneral/thread/cc079961-57c8-441c-9529-a5f9fb1f6901
Additionally, you can directly get the title of the document currently displayed in the WebBrowser object like this:
Dim title As String = WebBrowser1.DocumentTitle.ToString
Best regards,
Martin Xie
Please remember to mark the replies as answers if they help and unmark them if they provide no help.
Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.- Marked As Answer byMartin Xie - MSFTMSFT, ModeratorFriday, November 13, 2009 3:13 AM
- Proposed As Answer byCor LigthertMVPTuesday, November 10, 2009 8:35 AM
All Replies
Hi VBNETMAN,
Nice to see you here:)
Generally we can locate webpage elements based on their attributes in WebBrowser.Document and then automate them (e.g. retrieve page text, click button or hyperlink, etc.).
However, the Class attribute in such elements as <span>, <div> seems not be recognised by WebBrowser.
Please check this thread for reference:
http://social.msdn.microsoft.com/Forums/en/vbgeneral/thread/cc079961-57c8-441c-9529-a5f9fb1f6901
Additionally, you can directly get the title of the document currently displayed in the WebBrowser object like this:
Dim title As String = WebBrowser1.DocumentTitle.ToString
Best regards,
Martin Xie
Please remember to mark the replies as answers if they help and unmark them if they provide no help.
Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.- Marked As Answer byMartin Xie - MSFTMSFT, ModeratorFriday, November 13, 2009 3:13 AM
- Proposed As Answer byCor LigthertMVPTuesday, November 10, 2009 8:35 AM
- VBNetman
In addition to Martin,
For to do what you want: the best solution is using MSHTML, which is in the full versions of Visual Studio.
Be aware that it is not in the Express version, so you have to download than that DLL, but I forgot the address from that. I assume that searching then for that is something you can do yourself.
Be aware MSHTML has endless references so puting an import to it makes your code editor terrible slow because of the intelicense.
Success
Cor- Marked As Answer byMartin Xie - MSFTMSFT, ModeratorFriday, November 13, 2009 3:13 AM


