none
WebClent return 404 error, But this url work in the WebBrowser RRS feed

  • Question

  • Hi,

    I try to download a html using WebClient. Please see my code below.

      public string GetWebData(string url)
            {
                string html = string.Empty;
                using (WebClient client = new WebClient())
                {
                    Uri innUri = null;
                    Uri.TryCreate(url, UriKind.RelativeOrAbsolute, out innUri);
    
                    try
                    {
                        client.Headers.Add("Accept-Language", " en-US");
                        client.Headers.Add("Accept-Encoding", "gzip, deflate");
                        client.Headers.Add("Accept", " text/html, application/xhtml+xml, */*");
                        client.Headers.Add("User-Agent", "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)");
    
                        using (StreamReader str = new StreamReader(client.OpenRead(innUri)))
                        {
                            html = str.ReadToEnd();
                        }
                    }
                    catch (WebException we)
                    {
                        throw we;
                    }
                    return html;
                }
            }

    url  is   http://www.paginegialle.it/roma-rm/abbigliamento/alberto-aspesi-c.



    But I visit this url  IE9 and Firefox and Chrome Browsers. I use the Fiddler to solve this problem.

    I find the url is changed in the after the WebClient Request see the image below.

    Actual url  : http://www.paginegialle.it/roma-rm/abbigliamento/alberto-aspesi-c.

    Please see  the difference. I remove the dot in at the end of the url. But it's not working in

    the Browsers ( IE9, Firefox, Chrome). how to change the actual url to this url.

    Please Help me.


    rageshS


    • Edited by RageshShiva Wednesday, November 7, 2012 6:17 AM
    Monday, November 5, 2012 8:12 AM

Answers

  • Thanks for all members for helping,

    Real problem is the dot ended url http://www.paginegialle.it/roma-rm/abbigliamento/alberto-aspesi-c.

    in my code i try to convert this string url to System.Uri

     Uri innUri = null;
     Uri.TryCreate(url, UriKind.RelativeOrAbsolute, out innUri);

    After the Uri conversion i missed the dot at the end of the url. I really wondered and confused in

    this bug. I get another solution that adopt a different way to convert the sring url to System.Uri.

    System.Uri ManipulateBrokenUrl_Ragesh(string surl)
            {
                var url = new Uri(surl);
                //Console.WriteLine("Broken: " + url.ToString());
    
                MethodInfo getSyntax = typeof(UriParser).GetMethod("GetSyntax", System.Reflection.BindingFlags.Static | System.Reflection.BindingFlags.NonPublic);
                FieldInfo flagsField = typeof(UriParser).GetField("m_Flags", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic);
                if (getSyntax != null && flagsField != null)
                {
                    foreach (string scheme in new[] { "http", "https" })
                    {
                        UriParser parser = (UriParser)getSyntax.Invoke(null, new object[] { scheme });
                        if (parser != null)
                        {
                            int flagsValue = (int)flagsField.GetValue(parser);
                            // Clear the CanonicalizeAsFilePath attribute
                            if ((flagsValue & 0x1000000) != 0)
                                flagsField.SetValue(parser, flagsValue & ~0x1000000);
                        }
                    }
                }
    
                url = new Uri(surl);
                return url;
            }

    Now in this method I get the correct url  after the conversion. WebClient successfully download the

    data in dot ended url .This is work's fine. Now I solve the problem.

    all credits gone http://stackoverflow.com/questions/856885/httpwebrequest-to-url-with-dot-at-the-end

    Thanks


    rageshS


    • Marked as answer by RageshShiva Thursday, November 15, 2012 9:23 AM
    • Unmarked as answer by RageshShiva Thursday, November 15, 2012 9:23 AM
    • Marked as answer by RageshShiva Thursday, November 15, 2012 9:23 AM
    • Edited by RageshShiva Thursday, November 15, 2012 9:25 AM
    Thursday, November 15, 2012 9:22 AM

All replies

  • It downloads async without problem.
    Monday, November 5, 2012 9:08 AM
  • Thank you for your valuabel reply.

    Please explain or give an example.


    rageshS

    Monday, November 5, 2012 9:11 AM
  • Look in help for "WebClient.DownloadStringAsync" method.

    Monday, November 5, 2012 9:18 AM
  • I changed the code like

     public void GetWebData(string url)
            {
                using (WebClient client = new WebClient())
                {
                    Uri innUri = null;
                    Uri.TryCreate(url, UriKind.RelativeOrAbsolute, out innUri);
    
                    try
                    {
                        client.Headers.Add("Accept-Language", " en-US");
                        client.Headers.Add("Accept-Encoding", "gzip, deflate");
                        client.Headers.Add("Accept", " text/html, application/xhtml+xml, */*");
                        client.Headers.Add("User-Agent", "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)");
                        client.DownloadStringCompleted += new DownloadStringCompletedEventHandler(client_DownloadStringCompleted);
    
                        client.DownloadStringAsync(innUri);
                    }
                    catch (WebException we)
                    {
                        throw we;
                    }
                }
            }
            void client_DownloadStringCompleted(object sender, DownloadStringCompletedEventArgs e)
            {
                if (e.Error == null)
                {
                    string res = e.Result;
                }
            }

    But i get the same error in the e.Result. Please see the image below.


    Please Help.


    rageshS

    Monday, November 5, 2012 10:08 AM

  • I am very confused.

    Actual Url is http://www.paginegialle.it/roma-rm/abbigliamento/alberto-aspesi-c. . it's works

    in the Browsers. Please see the " dot " sign at the end of the url.  But when I pass this url as

    function parameter, " dot " sign will remove in the url and the url is go to invalid path.

    I use this code to convert the string url to the System.Uri.

     Uri innUri = null;
     Uri.TryCreate(url, UriKind.RelativeOrAbsolute, out innUri);

    after this conversion i get the

    Url is http://www.paginegialle.it/roma-rm/abbigliamento/alberto-aspesi-c    ( without " dot ").

    This path is invaild. I am clueless.  Any fault in my Uri Conversion.

    Please Help.



    rageshS


    • Edited by RageshShiva Monday, November 5, 2012 12:27 PM
    Monday, November 5, 2012 12:26 PM
  • This code works for me.  If it doesn't work for you, try navigating to the url in your browser first and try again.

    private void button1_Click(object sender, EventArgs e)
        {
    	    WebClient WC = new WebClient();
    	    WC.DownloadStringCompleted += WC_DownloadComplete;
    	    WC.DownloadStringAsync(new Uri("http://www.paginegialle.it/roma-rm/abbigliamento/alberto-aspesi-c."));
        }
        public void WC_DownloadComplete(object sender, DownloadStringCompletedEventArgs e)
        {
          if (e.Error != null)
          {
            MessageBox.Show(e.Error.ToString());
          }
          else
          {
            string html = e.Result;
          }
        }


    • Edited by JohnWein Monday, November 5, 2012 1:09 PM
    Monday, November 5, 2012 1:06 PM

  • Thank you for your code samples.

    Sorry It's not working.....

    I check my WebBrowserControl to this url. Navigate (string Url)  is working fine

     MainBrowser.Navigate("http://www.paginegialle.it/roma-rm/abbigliamento/alberto-aspesi-c.");

    but Navigate(Uri url) is not working. It always shows the 404 error with this exist url

     MainBrowser.Navigate(new Uri("http://www.paginegialle.it/roma-rm/abbigliamento/alberto-aspesi-c."));

    When i try to convert the url string to System.Uri Actual url is changed an invalid url.

    Any Idea. Please Help





    rageshS


    • Edited by RageshShiva Tuesday, November 6, 2012 8:27 AM
    Tuesday, November 6, 2012 8:26 AM
  • It could be a culture or encoding difference.  I'm in the USA.
    Tuesday, November 6, 2012 10:06 AM
  • Thanks.

    I'm in India. Do you mean using System.Globalization.CultureInfo with System.Uri.


    rageshS

    Tuesday, November 6, 2012 11:00 AM
  • When you deffine your Uri string put an @ in front of the uri so the period doesn't get removed.

    string uri = @"http://www.paginegialle.it/roma-rm/abbigliamento/alberto-aspesi-c."

    If this doesn't work use wireshark to trace the connection to get a better idea which header is getting rejected.


    jdweng

    Tuesday, November 6, 2012 11:38 AM
  • This code gives me a 500, Internal Server Error, but works reliably with the commented line uncommented.

        private void button1_Click(object sender, EventArgs e)
        {
          WebClient WC = new WebClient();
          WebBrowser WB = new WebBrowser();
          //WB.Navigate("http://www.paginegialle.it/roma-rm/abbigliamento/alberto-aspesi-c.");
          string Html = WC.DownloadString("http://www.paginegialle.it/roma-rm/abbigliamento/alberto-aspesi-c.");
        }
    

    Tuesday, November 6, 2012 9:46 PM
  • Thank you very much  JohnWein for your effort.

    There is no way to download this url to WebClient. it's not possible ?. Please explain how to affect

    encoding difference in WebClient DownloadStrong method.


    rageshS

    Wednesday, November 7, 2012 6:18 AM
  • "There is no way to download this url to WebClient. it's not possible ?."

    Certainly it's possible using the code I posted in the USA.  I'm not familiar enough with cultures and encodings to advise you how to get it to work in your culture.

    Wednesday, November 7, 2012 8:25 AM
  • It is possible. No all websites are the same and there may be Viruss Protection software on your PC and/or a Firewall that blocks certain port numbers.  Some websites are secure and/or require a proxy or a cookie.  Each situation is different.

    Normally the best way to proceed is to use your IE and connect to the website while tracing with fiddler.  Capture the TCP and HTTP messages.  Then repeat with your Visual Studio application and compare the two results.  Once we know exactly what is diffferent between the two captures we can better help give you a solution.  I suspect there is a difference during the header negotiation between your client application and the server.   Possibly the server has html1.1 configured and your application is only recognizing html1.0.  Not sure.


    jdweng

    • Proposed as answer by Mike FengModerator Thursday, November 8, 2012 12:07 PM
    • Unproposed as answer by RageshShiva Friday, November 9, 2012 3:44 AM
    Wednesday, November 7, 2012 10:21 AM
  • Joel Thank you very much for your Great Effort.

    There is no antivirus application available in my Developing PC.

    I change my function

       public void GetWebData(string url)
            {
                using (WebClient client = new WebClient())
                {
                    Uri innUri = null;
                                    Uri.TryCreate(url, UriKind.RelativeOrAbsolute, out innUri);
                                    try
                    {
                        client.Headers[HttpRequestHeader.UserAgent] = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)";
                        client.Headers[HttpRequestHeader.ContentType] = "text/html;charset=utf-8";
                        client.Headers[HttpRequestHeader.Accept] = "text/html, application/xhtml+xml, */*";
                        client.Headers[HttpRequestHeader.AcceptLanguage] = "en-US";
                        client.Headers[HttpRequestHeader.AcceptEncoding] = "gzip, deflate";
                        client.Headers[HttpRequestHeader.Pragma] = "no-cache";
                                            string bytea = client.DownloadString(innUri);
                    }
                    catch (WebException we)
                    {
                        throw we;
                    }
                }
            }

    I check fddler with my url in IE9 and  Windows application. Please see the image below.

    IE result

    Application

    But 404 is occured. There is any Header is needed in webclient.


    rageshS

    Friday, November 9, 2012 9:51 AM
  • If the IE application works then the antivirus is not an issue.  It is obvious that the IE application is sending cookies.  To get the webclient to send cokies you need to use the default credentials like below

                using (WebClient client = new WebClient())
                {
                    client.UseDefaultCredentials = true;
                }

    If the default credetials give same reults then see the webpage below

    http://www.codeproject.com/Articles/196549/WebClient-Class-with-Cookies

    Note the new class that was created inherits the WebClient Class

    class WebClientWithCookies: WebClient


    jdweng

    Friday, November 9, 2012 10:20 AM
  • Hi Ragesh,

    Do you have any update?

    Best regards,


    Mike Feng
    MSDN Community Support | Feedback to us
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Wednesday, November 14, 2012 1:37 PM
    Moderator
  • Thank you Mike for asking.

    rageshS


    • Edited by RageshShiva Thursday, November 15, 2012 9:06 AM
    Thursday, November 15, 2012 9:02 AM
  • Thanks for all members for helping,

    Real problem is the dot ended url http://www.paginegialle.it/roma-rm/abbigliamento/alberto-aspesi-c.

    in my code i try to convert this string url to System.Uri

     Uri innUri = null;
     Uri.TryCreate(url, UriKind.RelativeOrAbsolute, out innUri);

    After the Uri conversion i missed the dot at the end of the url. I really wondered and confused in

    this bug. I get another solution that adopt a different way to convert the sring url to System.Uri.

    System.Uri ManipulateBrokenUrl_Ragesh(string surl)
            {
                var url = new Uri(surl);
                //Console.WriteLine("Broken: " + url.ToString());
    
                MethodInfo getSyntax = typeof(UriParser).GetMethod("GetSyntax", System.Reflection.BindingFlags.Static | System.Reflection.BindingFlags.NonPublic);
                FieldInfo flagsField = typeof(UriParser).GetField("m_Flags", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic);
                if (getSyntax != null && flagsField != null)
                {
                    foreach (string scheme in new[] { "http", "https" })
                    {
                        UriParser parser = (UriParser)getSyntax.Invoke(null, new object[] { scheme });
                        if (parser != null)
                        {
                            int flagsValue = (int)flagsField.GetValue(parser);
                            // Clear the CanonicalizeAsFilePath attribute
                            if ((flagsValue & 0x1000000) != 0)
                                flagsField.SetValue(parser, flagsValue & ~0x1000000);
                        }
                    }
                }
    
                url = new Uri(surl);
                return url;
            }

    Now in this method I get the correct url  after the conversion. WebClient successfully download the

    data in dot ended url .This is work's fine. Now I solve the problem.

    all credits gone http://stackoverflow.com/questions/856885/httpwebrequest-to-url-with-dot-at-the-end

    Thanks


    rageshS


    • Marked as answer by RageshShiva Thursday, November 15, 2012 9:23 AM
    • Unmarked as answer by RageshShiva Thursday, November 15, 2012 9:23 AM
    • Marked as answer by RageshShiva Thursday, November 15, 2012 9:23 AM
    • Edited by RageshShiva Thursday, November 15, 2012 9:25 AM
    Thursday, November 15, 2012 9:22 AM