Answered by:
Extract HTML from a redirected page

Question
-
Hello,
I am using visual basic 2005. I found on the web the following function that extracts HTML from webpages. It is very useful but unfortunately it does not work with redirected pages. That is, when I put in it a URL of a redirect page it gives me nothing or error. I added to it ".AllowAutoRedirect = True" but still it did not work. I wonder how to make it work for redirected pages.
I appreciate the help.Public Function GetPageHTML(ByVal URL As String, _ Optional ByVal TimeoutSeconds As Integer = 10) _ As String ' Retrieves the HTML from the specified URL, ' using a default timeout of 10 seconds Dim objRequest As Net.HttpWebRequest Dim objResponse As Net.HttpWebResponse Dim objStreamReceive As System.IO.Stream Dim objEncoding As System.Text.Encoding Dim objStreamRead As System.IO.StreamReader Try ' Setup our Web request objRequest = Net.WebRequest.Create(URL) objRequest.Method = "GET" objRequest.KeepAlive = True objRequest.AllowAutoRedirect = True objRequest.Timeout = TimeoutSeconds * 1000 ' Retrieve data from request objResponse = objRequest.GetResponse() objStreamReceive = objResponse.GetResponseStream objEncoding = System.Text.Encoding.GetEncoding( _ "utf-8") objStreamRead = New System.IO.StreamReader( _ objStreamReceive, objEncoding) ' Set function return value GetPageHTML = objStreamRead.ReadToEnd() ' Check if available, then close response If Not objResponse Is Nothing Then objResponse.Close() End If Catch Return "error" End Try End Function
- Edited by Hani Deek Thursday, November 24, 2011 4:50 PM
Thursday, November 24, 2011 4:47 PM
Answers
-
when I put in it a URL of a redirect page it gives me nothing or error.
And what is the text of that error? Ah, I see you are using Try..Catch. I suggest that you comment out the Try..Catch parts and see what the actual error is. There is no point hiding the problem from yourself ;)
--
Andrew- Marked as answer by Hani Deek Wednesday, November 30, 2011 7:21 AM
Thursday, November 24, 2011 8:03 PM -
WebClient sounds a lot easier to use in this case.
The DownloadString() sounds perfect.
Regards, MusicDemon- Marked as answer by Hani Deek Wednesday, November 30, 2011 7:21 AM
Thursday, November 24, 2011 11:00 PM -
Hi hd86,
Here are many ways to get information from a web page. The simple way to achieve it is using WebBrowser Class. You only need to add a WebBrowser control in the windows application form, and add the following code:
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load WebBrowser1.Navigate("http://en.wikipedia.org/wiki/usa") End Sub
If you want to get the data form web page, here are two suggestions:1. Use WebBrowser Class: http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser(v=VS.80).aspx
2. Use MSHTML: http://www.vb-tips.com/MSHTML.aspx
If you have any additional questions, please feel free to let me know.
Mark Liu-lxf [MSFT]
MSDN Community Support | Feedback to us
- Marked as answer by Hani Deek Wednesday, November 30, 2011 7:21 AM
Monday, November 28, 2011 10:54 AMModerator
All replies
-
when I put in it a URL of a redirect page it gives me nothing or error.
And what is the text of that error? Ah, I see you are using Try..Catch. I suggest that you comment out the Try..Catch parts and see what the actual error is. There is no point hiding the problem from yourself ;)
--
Andrew- Marked as answer by Hani Deek Wednesday, November 30, 2011 7:21 AM
Thursday, November 24, 2011 8:03 PM -
Hello Andrew,
It gives me different error messages each time, and sometimes it does not give me an error message but it just returns nothing or blank text (I guess this is what is called a logical error).
When I enter the following URL:
http://en.wikipedia.org/wiki/usa
The most common error message I get is:
The remote server returned an error: (403) Forbidden.
So it is either error or empty text.
In the debug mode, the following line was highlighted:
objResponse = objRequest.GetResponse
- Edited by Hani Deek Thursday, November 24, 2011 8:53 PM
Thursday, November 24, 2011 8:47 PM -
WebClient sounds a lot easier to use in this case.
The DownloadString() sounds perfect.
Regards, MusicDemon- Marked as answer by Hani Deek Wednesday, November 30, 2011 7:21 AM
Thursday, November 24, 2011 11:00 PM -
Could you please show me how to use it in a code? I appreciate your help.Friday, November 25, 2011 5:14 PM
-
Could you PLEASE click the link?
Regards, MusicDemonFriday, November 25, 2011 5:16 PM -
I looked in the link at first but I did not know how to use it because I have started learning about classes and objects only yesterday. I don't know much in programming. The code in the question was not written by me.
Anyway, I have managed to try the code you posted and you are right, it replaces the whole code that I posted with only two lines. It is amazing. Thank you for posting it.
However, it still did not work with redirected pages, which is the main problem. Is there a way that I can access a redirected page through visual basic?
Saturday, November 26, 2011 2:09 AM -
Hi hd86,
Here are many ways to get information from a web page. The simple way to achieve it is using WebBrowser Class. You only need to add a WebBrowser control in the windows application form, and add the following code:
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load WebBrowser1.Navigate("http://en.wikipedia.org/wiki/usa") End Sub
If you want to get the data form web page, here are two suggestions:1. Use WebBrowser Class: http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser(v=VS.80).aspx
2. Use MSHTML: http://www.vb-tips.com/MSHTML.aspx
If you have any additional questions, please feel free to let me know.
Mark Liu-lxf [MSFT]
MSDN Community Support | Feedback to us
- Marked as answer by Hani Deek Wednesday, November 30, 2011 7:21 AM
Monday, November 28, 2011 10:54 AMModerator -
hd86, does the website you want to extract has a Javascript redirector or a META / Header redirect?
Regards, MusicDemonMonday, November 28, 2011 4:35 PM -
Yes. Actually I have learned a lot over the past few days. I have discovered that you can put a web browser in another form than form1 and then browse any webpage you want without showing the form. This solves everything, although I am still having some trouble controlling the web browser.
- Edited by Hani Deek Wednesday, November 30, 2011 7:26 AM
Wednesday, November 30, 2011 7:25 AM -
You don't have to have the webbrowser object on another form, you can set the Visible proprty to False or the size to 0;0.
Regards, MusicDemonWednesday, November 30, 2011 12:34 PM