Answered VB: How to retrieve HTML from SHDocVW?

  • Monday, March 05, 2012 10:50 PM
     
      Has Code

    Hi,

    I would like to ask a question with SchDocVW Internet explorer control.

    I added the reference "Microsoft Internet Controls" forum.

    Then define

    Dim WithEvents MyBrowser1 As SHDocVw.InternetExplorer

    And created new instance, also assigns event handler for "DocumentComplete" event.

            MyBrowser1 = New SHDocVw.InternetExplorer
            MyBrowser1.Visible = True
            AddHandler MyBrowser1.DocumentComplete, AddressOf DocumentComplete_browser_1

    Since DocumentComplete event fires multiple times, I used contional judgemnt on "Readystate" to detect the completion of full webpage download. (Is it correct)?

    And when completeion is detected, I tried to get the HTML content by concatenating the inerHTML of all items.

    Private Sub DocumentComplete_browser_1() If MyBrowser1.ReadyState = WebBrowserReadyState.Complete Then Dim myHtml As String = "" If Not IsNothing(MyBrowser1.Document.All) Then 'If True Then For i = 1 To MyBrowser1.Document.All.Length - 1 If Not IsNothing(MyBrowser1.Document.All.Item(i)) AndAlso _ Not IsNothing(MyBrowser1.Document.All.Item(i).innerHTML) Then myHtml = myHtml + MyBrowser1.Document.All.Item(i).innerHTML End If Next End If End Sub

    But the above code frequently runs into error and stops.

    Could anyone give a known-good working example of how SHDocVW can be used to retrieve HTML contents reliably using VB?

    Bob

    Bob

All Replies

  • Monday, March 05, 2012 10:55 PM
     
      Has Code

    Hello Bob Sun,

    Hi,

    I would like to ask a question with SchDocVW Internet explorer control.

    I added the reference "Microsoft Internet Controls" forum.

    Then define

    Dim WithEvents MyBrowser1 As SHDocVw.InternetExplorer

    And created new instance, also assigns event handler for "DocumentComplete" event.

            MyBrowser1 = New SHDocVw.InternetExplorer
            MyBrowser1.Visible = True
            AddHandler MyBrowser1.DocumentComplete, AddressOf DocumentComplete_browser_1

    Since DocumentComplete event fires multiple times, I used contional judgemnt on "Readystate" to detect the completion of full webpage download. (Is it correct)?

    And when completeion is detected, I tried to get the HTML content by concatenating the inerHTML of all items.

    Private Sub DocumentComplete_browser_1() If MyBrowser1.ReadyState = WebBrowserReadyState.Complete Then Dim myHtml As String = "" If Not IsNothing(MyBrowser1.Document.All) Then 'If True Then For i = 1 To MyBrowser1.Document.All.Length - 1 If Not IsNothing(MyBrowser1.Document.All.Item(i)) AndAlso _ Not IsNothing(MyBrowser1.Document.All.Item(i).innerHTML) Then myHtml = myHtml + MyBrowser1.Document.All.Item(i).innerHTML End If Next End If End Sub

    But the above code frequently runs into error and stops.

    Could anyone give a known-good working example of how SHDocVW can be used to retrieve HTML contents reliably using VB?

    Bob

    Bob

    follow this thread http://social.msdn.microsoft.com/Forums/en-US/wpf/thread/f40e35e4-16f0-4353-a783-3ef136e57697

    Regards.


  • Wednesday, March 07, 2012 5:13 AM
     
     

    Hi Bob,

    Please pay attention that the sample form Carmelo is a WPF sample.

    Here are some samples about getting html contains, hope these help you:

    How to read current active IE window's html content:http://social.msdn.microsoft.com/Forums/en-US/csharpgeneral/thread/415f8b61-7db4-48ab-ab5c-71721afac718/  

    ByPass difficult Automation and add applications "as is" in your .NET application: http://www.codeproject.com/Articles/8905/ByPass-difficult-Automation-and-add-applications-q

    Manipulate/Change/Form Fill data in webpages using the Webbrowser control: http://www.vbforums.com/showthread.php?t=416275 

    you can convert the C# code to VB.Net with the help of this link: http://www.developerfusion.com/tools/convert/csharp-to-vb/


    No code, No fact.

  • Wednesday, March 07, 2012 9:35 AM
     
      Has Code

    Hello Bob, 

    I've done this about 10 years ago and then not anymore, however you are going in the wrong direction.

    What you do is using the .Net implemented Webbrowser either Forms or WPF While you wrote you wanted to use Internet Explorer, because all kind of securities I even don't know if it is still possible.

    However, this code looks a little bit more like it, and does the retrieving of the documents.

    Public Class Form1
        Dim WithEvents IE As SHDocVw.InternetExplorer
        Dim docs As New List(Of Object)
        Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
            IE = New SHDocVw.InternetExplorer
            IE.Visible = True
            IE.Navigate2("Http://www.microsoft.com")
        End Sub
        Private Sub IE_DocumentComplete(pDisp As Object, ByRef URL As Object) Handles IE.DocumentComplete
            Dim wb As SHDocVw.WebBrowser = DirectCast(pDisp, SHDocVw.WebBrowser)
            docs.Add(wb.Document)
        End Sub
    End Class

    You have to use for this MSHTML in a likewise way like you see done on this page on our website

    http://www.vb-tips.com/dbpages.aspx?Search=mshtml


    Success
    Cor

  • Sunday, March 11, 2012 4:07 PM
     
     Answered

    Carmelo, calanghei and Cor,

    I eventually verified that methods from many samples, including mine in the 1st post, all works for normal sites. The reason that the code "frequently runs into error and stops" was at the server side, not the code along. For a more robust version more error handlings are required in the code which requires some knowledge of the HTML and related stuff.

    Bob