locked
Extract HTML from a redirected page RRS feed

  • Question

  • Hello,

    I am using visual basic 2005. I found on the web the following function that extracts HTML from webpages. It is very useful but unfortunately it does not work with redirected pages. That is, when I put in it a URL of a redirect page it gives me nothing or error. I added to it ".AllowAutoRedirect = True" but still it did not work. I wonder how to make it work for redirected pages.

    I appreciate the help.

     

    Public Function GetPageHTML(ByVal URL As String, _
          Optional ByVal TimeoutSeconds As Integer = 10) _
         As String
            ' Retrieves the HTML from the specified URL,
            ' using a default timeout of 10 seconds
            Dim objRequest As Net.HttpWebRequest
            Dim objResponse As Net.HttpWebResponse
            Dim objStreamReceive As System.IO.Stream
            Dim objEncoding As System.Text.Encoding
            Dim objStreamRead As System.IO.StreamReader
    
            Try
                ' Setup our Web request
                objRequest = Net.WebRequest.Create(URL)
                objRequest.Method = "GET"
                objRequest.KeepAlive = True
                objRequest.AllowAutoRedirect = True
                objRequest.Timeout = TimeoutSeconds * 1000
                ' Retrieve data from request
                objResponse = objRequest.GetResponse()
                objStreamReceive = objResponse.GetResponseStream
                objEncoding = System.Text.Encoding.GetEncoding( _
                    "utf-8")
                objStreamRead = New System.IO.StreamReader( _
                    objStreamReceive, objEncoding)
                ' Set function return value
                GetPageHTML = objStreamRead.ReadToEnd()
                ' Check if available, then close response
                If Not objResponse Is Nothing Then
                    objResponse.Close()
                End If
            Catch
               Return "error"
            End Try
        End Function


    • Edited by Hani Deek Thursday, November 24, 2011 4:50 PM
    Thursday, November 24, 2011 4:47 PM

Answers

  • when I put in it a URL of a redirect page it gives me nothing or error.

    And what is the text of that error? Ah, I see you are using Try..Catch. I suggest that you comment out the Try..Catch parts and see what the actual error is. There is no point hiding the problem from yourself ;)

    --
    Andrew

    • Marked as answer by Hani Deek Wednesday, November 30, 2011 7:21 AM
    Thursday, November 24, 2011 8:03 PM
  • WebClient sounds a lot easier to use in this case.

    The DownloadString() sounds perfect.


    Regards, MusicDemon
    • Marked as answer by Hani Deek Wednesday, November 30, 2011 7:21 AM
    Thursday, November 24, 2011 11:00 PM
  • Hi hd86,

    Here are many ways to get information from a web page. The simple way to achieve it is using WebBrowser Class. You only need to add a WebBrowser control in the windows application form, and add the following code:

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
            WebBrowser1.Navigate("http://en.wikipedia.org/wiki/usa")
        End Sub
    


    If you want to get the data form web page, here are two suggestions:

    1.  Use WebBrowser Class: http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser(v=VS.80).aspx

    2.  Use MSHTML: http://www.vb-tips.com/MSHTML.aspx

    If you have any additional questions, please feel free to let me know.

     


    Mark Liu-lxf [MSFT]
    MSDN Community Support | Feedback to us
    • Marked as answer by Hani Deek Wednesday, November 30, 2011 7:21 AM
    Monday, November 28, 2011 10:54 AM
    Moderator

All replies

  • when I put in it a URL of a redirect page it gives me nothing or error.

    And what is the text of that error? Ah, I see you are using Try..Catch. I suggest that you comment out the Try..Catch parts and see what the actual error is. There is no point hiding the problem from yourself ;)

    --
    Andrew

    • Marked as answer by Hani Deek Wednesday, November 30, 2011 7:21 AM
    Thursday, November 24, 2011 8:03 PM
  • Hello Andrew,

    It gives me different error messages each time, and sometimes it does not give me an error message but it just returns nothing or blank text (I guess this is what is called a logical error).

    When I enter the following URL:

    http://en.wikipedia.org/wiki/usa

    The most common error message I get is:

    The remote server returned an error: (403) Forbidden.

    So it is either error or empty text.

    In the debug mode, the following line was highlighted:

    objResponse = objRequest.GetResponse

     

     

    • Edited by Hani Deek Thursday, November 24, 2011 8:53 PM
    Thursday, November 24, 2011 8:47 PM
  • WebClient sounds a lot easier to use in this case.

    The DownloadString() sounds perfect.


    Regards, MusicDemon
    • Marked as answer by Hani Deek Wednesday, November 30, 2011 7:21 AM
    Thursday, November 24, 2011 11:00 PM
  • Could you please show me how to use it in a code? I appreciate your help.
    Friday, November 25, 2011 5:14 PM
  • Could you PLEASE click the link?

    Regards, MusicDemon
    Friday, November 25, 2011 5:16 PM
  • I looked in the link at first but I did not know how to use it because I have started learning about classes and objects only yesterday. I don't know much in programming. The code in the question was not written by me.

    Anyway, I have managed to try the code you posted and you are right, it replaces the whole code that I posted with only two lines. It is amazing. Thank you for posting it.

    However, it still did not work with redirected pages, which is the main problem. Is there a way that I can access a redirected page through visual basic?

     

     

    Saturday, November 26, 2011 2:09 AM
  • Hi hd86,

    Here are many ways to get information from a web page. The simple way to achieve it is using WebBrowser Class. You only need to add a WebBrowser control in the windows application form, and add the following code:

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
            WebBrowser1.Navigate("http://en.wikipedia.org/wiki/usa")
        End Sub
    


    If you want to get the data form web page, here are two suggestions:

    1.  Use WebBrowser Class: http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser(v=VS.80).aspx

    2.  Use MSHTML: http://www.vb-tips.com/MSHTML.aspx

    If you have any additional questions, please feel free to let me know.

     


    Mark Liu-lxf [MSFT]
    MSDN Community Support | Feedback to us
    • Marked as answer by Hani Deek Wednesday, November 30, 2011 7:21 AM
    Monday, November 28, 2011 10:54 AM
    Moderator
  • hd86, does the website you want to extract has a Javascript redirector or a META / Header redirect?
    Regards, MusicDemon
    Monday, November 28, 2011 4:35 PM
  • Yes. Actually I have learned a lot over the past few days. I have discovered that you can put a web browser in another form than form1 and then browse any webpage you want without showing the form. This solves everything, although I am still having some trouble controlling the web browser.

    • Edited by Hani Deek Wednesday, November 30, 2011 7:26 AM
    Wednesday, November 30, 2011 7:25 AM
  • You don't have to have the webbrowser object on another form, you can set the Visible proprty to False or the size to 0;0.
    Regards, MusicDemon
    Wednesday, November 30, 2011 12:34 PM