HttpWebRequest fails to download entire HTML page.
-
27 Şubat 2008 Çarşamba 23:55I'm using the following code to download a html page
Dim lcUrl As String = "http://www.moen.com/products/F87420"
Dim loHttp As HttpWebRequest = WebRequest.Create(lcUrl)
Dim loWebResponse As HttpWebResponse
Dim enc As Encoding
Dim loResponseStream As StreamReader
Dim oSB As StringBuilder = New StringBuilder(1024000)
loHttp.Timeout = 60000 '10 secs
loWebResponse = loHttp.GetResponse()
enc = Encoding.UTF8
loResponseStream = New StreamReader(loWebResponse.GetResponseStream(), enc)
oSB.Append(loResponseStream.ReadToEnd())
loWebResponse.Close()
loResponseStream.Close()
When I check the contents of the stringbuilder object "oSB" its length is 44843 (appx 44k) -which is not correct
the entire web-page is appx 60k -I know this for a fact and searching for the ending html tag </html> comes back -1
not present.
From my web-browser FireFox "save as" downloads the entire page 60k.
I experienced the same results using WebClient.DownloadFile -Any ideas why? and how to resolve this?
Tüm Yanıtlar
-
28 Şubat 2008 Perşembe 00:15Moderatör
Using the fiddler2 tool I am noticing the same difference you mentioned.
I suspect the remote server reacts to the default header values to System.Net.HttpWebRequest puts differently to that of your WebBrowser.
Try to add the custom header values where such differences does happen. Like the User-Agent, Accept and Accept-Language, Accept-Encoding etc.
svs
-
28 Şubat 2008 Perşembe 14:47SVS -Thanks for the response
I tried specifying a User-Agent "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" -which I found MSDN online
which didn't make any difference -still not getting the entire file. I tried code directly from the MSDN and the same results as well.
Is this a bug? I've downloaded a couple hundred pages using WebClient.Download many are larger than 44k -which worked fine -I wonder why this site is being a pain -Any more ideas to try would be appreciated.
Thanks
Roland -
29 Şubat 2008 Cuma 18:10Moderatör
I am not sure either... Please download the fiddler2 tool and capture an IE Browser send request and try to add as much of the headers to match.
It must be some magic header value that the server is caring for, I am not sure what. There is no other explanation that comes to my mind that can explain why IE does get the full response vs. a diminished response from any other http client.
-
01 Mart 2008 Cumartesi 13:07
If you have link to file on web, try this simple approach. It is in C# but I hope you will manage:
Downloading Fiels:
There are two ways of downloading a file from a web site using WebClient, depending on whether we want to save the file, or process the contents of the directly within your application. If we simply want to save the file then we should call the DownloadFile() method takes two parameters, The URL from where we want to retrieve the file, and the file name ( or path) that we want to save the file to.
More commonly, your application will want to process the data retrieved from the web site. In order to do this, you use the OpenRead () method, Which returns a stream reference. You can then simply retrieve the data from the stream.WebClient Client = new WebClient (); Client.DownloadFile("http://www.csharpfriends.com/Members/index.aspx", " index.aspx");
The following code will demonstrate the WebClient.OpenRead () method.WebClient Client = new WebClient (); Stream strm = Client.OpenRead ("http://www.csharpfriends.com/Members/index.aspx");
In this case we will simply display the contents of the downloaded data in list box.
We create the project as standard Windows C# application, and a list box called listbox1, in which we will display the contents of the downloaded file. We make the following changes to the constructor of the main form.public form1() { InitializeComponent(); System.Net.WebClient Client = new WebClient(); Stream strm = Client.OpenRead("http://www.csharpfriends.com"); StreamReader sr = new StreamReader(strm); string line; do { line = sr.ReadLine(); listbox1.Items.Add(line); } while (line !=null); strm.Close(); }
-
10 Mart 2008 Pazartesi 18:34I've tried this approach before -it works for most files -but not all
Thanks away. -
07 Mart 2012 Çarşamba 07:51
It is now 3 years later, and I'm working on .Net 4 and I also have this issue. There is very little documentation about this.
This issue is blocking for me. I think it has to do with a response in chunks. I recreated the exact request done by my browser, which does it correct.
But still my c# webrequest does not manage to download the complete site.
Help MS Help
Best regards