none
Why use HTTP Client can cause 503 error quite often RRS feed

  • Question

  • Hello:

    I have a C# .NET Core project to download around 200 web pages after I login to a web site.

    The following is my C# code to download HTML from URL:

    public static async Task<HttpClient> Create_HttpClient()
    {
        try
        {
        ServicePointManager.Expect100Continue = false;
        ServicePointManager.DefaultConnectionLimit = 1000;
        ServicePointManager.SecurityProtocol =
        SecurityProtocolType.Tls12 | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls;
        ServicePointManager.ServerCertificateValidationCallback += (sender, cert, chain, sslPolicyErrors) => true;
        HttpClientHandler clientHandler = new HttpClientHandler()
        {
        AllowAutoRedirect = true,
        AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip,
        };
        HttpClient _client1 = new HttpClient(clientHandler);
        _client1.DefaultRequestHeaders.Accept.Clear();
        _client1.DefaultRequestHeaders.Add("Accept-Encoding", "gzip, deflate");
        _client1.DefaultRequestHeaders.Add("X-Requested-With", "XMLHttpRequest");
        return (_client1);
        }
        catch (Exception ex)
        {
        }
        return (null);
    }
    
    public static async Task<string> Get_WebContent(string url1)
    {
        try
        {
        using (HttpClient client1 = await Create_HttpClient())
        {
        using (HttpResponseMessage http_reply1 = await client1.GetAsync(url1))
        {
        string html_content = await http_reply1.Content.ReadAsStringAsync();
        if ((html_content != "") && 
    (!html_content .Contains("503 Service Temporarily Unavailable")))
        {
        string page_html = html_content.Replace("\n", "");
        }
        return (page_html);
        }
        catch (Exception ex)
        {
        }
        return (null);
    }
    
    static async Task Main()
    {
    List<string> web_links = new List<string>();
    for (int i = 1; i <= 200; i++)
    {
        string page_url1 = string.Format("https://myweb.com/markets/page={0}", i);
        web_links.Add(page_url1);
    }
    
    for(int i = 1; i <= 200; i++)
    {
        using (HttpClient client1 = await Create_HttpClient())
        {
        using (HttpResponseMessage http_reply1 = await client1.GetAsync(web_links[i]))
        {
        string html_content = await http_reply1.Content.ReadAsStringAsync();
        if ((html_content != "") && 
    (!html_content .Contains("503 Service Temporarily Unavailable")))
        {
        string page_html = html_content.Replace("\n", "");
        }
    }
    }
    }
    

    I can run my program, but for nearly half of the time, I always get 503 Error: Service Temporarily Unavailable. (Some times, most of them (70%+) show 503 Error.)

    But if I use web browser to visit each of the web links, most of the time (90%+ of the times), I can see the web URL content.

    However, since using web browser to visit all 200 web links taking too much time, all the web links have redirect web link, and due to some timeout issue, using web browser to visit can get stuck from time to time, so using web browser to visit all 200 web links is not suitable for C# program.

    I guessed that because web browser could wait for DOM download for a longer timeout, but even if I changed timeout for http client, but it doesn’t help at all.

    I also tried to use the following code:

    IEnumerable<Task> download_all_links =

    web_links.Select(x=> Get_WebContent(x));

    awaitTask.WhenAll(download_all_links);

    However, I found this is much worse comparing to my previous code, nearly 100% of the web links, I got 503 Error.

    Please advice how I can fix my issue: quite often I got 503 Error using http client; but using web browser I didn’t get 503 Error, yet using web browser is not suitable in C# .NET Core project.

    By the way, I am using Visual Studio 2019 Version 16.2.5 on Windows 10 (Version 1903).

    Thanks,

    Monday, September 16, 2019 10:51 PM

Answers

  • You have no control over whether the server can handle the request at the time or not. In general web servers should be able to handle 100s of simultaneous connections so it shouldn't be an issue. 

    Your `IsSuccessStatusCode` doesn't really do anything. `EnsureSuccessStatusCode` is the method you should call. It checks this property and throws an exception if it is not success (2xx). If that call isn't failing then the server is returning a success response. That doesn't make sense for a 503. 503 is a server error and therefore the `Ensure` call will fail. Set a breakpoint on the response and check the status code. If you are getting 503 in the HTML but the status code is returning 200 then the server is incorrectly coded. This would be exceedingly rare.

    For cases where you are working with an unstable server consider a retry policy. For example you could try to connect and if it fails wait a little bit and try again. Making a couple of attempts before considering it failed is generally recommended. You can look into Polly which supports this for HttpClient but it may not play well with your other tasks.

    As for the tasks, you should have no problems running more than one thread at a time. Again, server can handle 100s of calls. If you are getting 503s after more than one connection then the issue might be on the server side blocking simultaneous requests. That would be really odd since we do that all the time normally. Nevertheless if you only want to run 1 task at a time then you don't need all the extra logic in the main anymore. Just use a simple async/await loop.

    static async Task Main()
    {
       var client = Create_HttpClient();
    
       foreach (var url in GetUrls(200))
       {
          var result = await Get_WebContent(client, url);
          //Process the results      
       };
    }


    Michael Taylor http://www.michaeltaylorp3.net

    • Marked as answer by zydjohn Friday, September 20, 2019 8:10 PM
    Friday, September 20, 2019 2:13 PM
    Moderator

All replies

  • The 503 is Web server side error. There can be several reasons why the Web server  returned the 503, like Web server is overloaded, a Web program on the Web server is taxed, many other reasons, etc. and ect. and you don't have control of the Web server to debug things, you're between a rock and a hard spot.

    Monday, September 16, 2019 11:19 PM
  • Hi zydjohn,

    Thank you for posting here.

    Based on my search, the following link describes that how to use exception to solve the 503 error. 

    HttpWebRequest Error: 503 server unavailable

    Hope this could help you.

    Best Regards,

    Jack


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Tuesday, September 17, 2019 5:29 AM
    Moderator
  • Because you're overwhelming the server and they start failing your calls. You cannot realistically expect to download all that at the same time. You need to do batch processing. In general if you're downloading stuff I would recommend less than 10 simultaneous connections. However the remote server may see your attempts as a DDoS attack and block you so the less the better.

    Michael Taylor http://www.michaeltaylorp3.net

    Tuesday, September 17, 2019 2:11 PM
    Moderator
  • Hello:

    Thanks for your reply, but can you provide some code sample, how to do this.  My current solution is to put my above code in a timer, each time the timer triggers, my code will try to download all the links missing from last iteration.  Therefore, after a few iterations, most of the links are downloaded.

    But I don't know if it is a good idea, I want to see how you can solve this.

    Thanks,

    Tuesday, September 17, 2019 9:35 PM
  • There are different ways to do it depending upon your needs but I wouldn't use a timer. The issue with a timer is that the timer may fire before you have finished your last run.

    Given your existing code it might be easiest to clean up your download logic and then use a semaphore to control how many times the function is called. Modify your download logic to return a Task. Inside that task request the semaphore, download the contents and then release the semaphore. The semaphore controls the parallel execution part. You can then enumerate the URLs, call the function for each one and then wait for them all to complete.

    //NOT TESTED
    //Field in class, allow up to 5 requests at same time
    private SemaphoreSlim _semaphore = new SemaphoreSlim(5);
    
    private async Task<string> Get_WebContent ( HttpClient client, string url )
    {
       //Block until you can run
       _semaphore.Wait();
    
       try
       {
          using (var response = await _client.GetAsync(url))
          {
             response.EnsureSuccessStatusCode();
    
             //Do your processing of response
          };
       } finally
       {
          _semaphore.Release();
       };
    }
    
    //Main code
    static void Main ()
    {
       var program = new Program();
    
       program.Run();
    }
    
    private void Run ()
    {
       var urls = GetUrls();
    
       var tasks = new List<Task>();
    
       var client = InitializeClient();
       foreach (var url in urls)
       {
          var task = Get_WebContent(client, url);
          tasks.Add(task);
       }; 
    
       //Wait
       Task.WaitAll(tasks.ToArray());
    }
    
    private HttpClient InitializeClient ()
    {
       var client = new HttpClient();
       ...
       return client;
    }

    Note that it is very important that you DO NOT create an HttpClient multiple times. Furthermore you should not clean it up. Doing so is going to cause you havoc and may be the cause of your current problems. HttpClient is designed to be created once and reused for the life of the app. Generally we do this once per unique URL which wouldn't work in your case.

    There are other options as well including a limited concurrency task scheduler, that doesn't play well with async/await, using Parallel.ForEach with a partitioner, etc.


    Michael Taylor http://www.michaeltaylorp3.net

    Tuesday, September 17, 2019 10:00 PM
    Moderator
  • Hello:

    Thank you very much: I changed my code according to your sample.  It basically works, with one minor issue: quite often there is one link missing.  I mean for 200 web links, I can use HTTP client to successfully download 199 web links, but using semaphore, the last web link is always missing from the downloaded links.  Even it is not so important, as I got 99.5% of the links, but I want to know what reason can cause such one missing web link?  I have tried more than 20 times, I got the same result each time: 199 out of 200 web links work, but only one did NOT work!.

    PS: I changed semaphore to use count 1, as I found it can get the best result.  And I can see the error when launching http client to visit all web links, the last link, I saw the error message: The operation was canceled.

    Thanks,


    • Edited by zydjohn Wednesday, September 18, 2019 10:22 PM Typo
    Wednesday, September 18, 2019 9:50 PM
  • If you set the semaphore count to 1 then you aren't running anything in parallel anymore. You can eliminate the semaphore and wait all completely and simply loop through your URLs and await each task.

    As for the 199 out of 200 I'm going to guess there is an off by one error. Can you post the code?


    Michael Taylor http://www.michaeltaylorp3.net

    Thursday, September 19, 2019 4:22 AM
    Moderator
  • Hello:
    I found another issue: using semaphore to download all web links seems not very stable.  Sometimes, the code didn't run at all, in debug mode, if I add break point at:
    _semaphore.Wait(); but the code is never executed.
    In this case, if I restart my program, it runs well again.
    Now, most of the times, it misses only one web link, as I can see the error: The operation was canceled.
    But, sometimes, it works perfect, I didn't see the operation cancellation error at all.
    However, if I delete semaphore, then the result is terrible: nearly half of the web links I got 503 service unavailable error again.
    So, using the semaphore is necessary, as without it, it is very difficult or even impossible to download all the web links, but using semaphore has some stability issue, sometimes, it didn't work at all.
    Please advice why I encounter some stability issues.
    Thanks,
    Thursday, September 19, 2019 1:53 PM
  • It sounds like your semaphore code isn't correct. We use this type of stuff all the time and have no issues. Please post your updated code.

    Michael Taylor http://www.michaeltaylorp3.net

    Thursday, September 19, 2019 5:06 PM
    Moderator
  • Hello: 

    The following is my code, my goal was to download HTML contents from 200 web links, and parse one hidden input field, and get the token from the input field and save them in a dictionary.  Since the HTML contents change from time to time, so I have to run my code around every 10 minutes.

    private static SemaphoreSlim _semaphore = new SemaphoreSlim(1);
    private static ConcurrentDictionary<int, string> Dtokens = new ConcurrentDictionary<int, string>();
    private static async Task Get_WebContent(int id1, HttpClient client, string url)
    {
        _semaphore.Wait();
        try
        {
        using (var response = await client.GetAsync(url))
        {
        string html_content = await response.Content.ReadAsStringAsync();
        string page_html = html_content.Replace("\n", "");
        if ((page_html != "") && (!page_html.Contains("503 Service Temporarily Unavailable")))
        {
        string reg_token1 = Regex.Match(page_html, @"<input type=""hidden"" name=""csrftoken""(.*?)>").Value;
        string token1 = reg_token1.Substring(38);
        Dtokens.AddOrUpdate(id1, token1, (k, v) => v);
        }
        };
        }
        finally
        {
        _semaphore.Release();
        };
    }
    
    private static HttpClient Create_HttpClient()
    {
        try
        {
        ServicePointManager.Expect100Continue = false;
        ServicePointManager.SecurityProtocol =
        SecurityProtocolType.Tls12 | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls;
        ServicePointManager.ServerCertificateValidationCallback += (sender, cert, chain, sslPolicyErrors) => true;
        HttpClientHandler clientHandler = new HttpClientHandler()
        {
        AllowAutoRedirect = true,
        AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip,
        };
        HttpClient _http_client1 = new HttpClient(clientHandler);
        _http_client1.DefaultRequestHeaders.Accept.Clear();
        _http_client1.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("*/*"));
        _http_client1.DefaultRequestHeaders.Add("Accept-Encoding", "gzip, deflate");
        _http_client1.DefaultRequestHeaders.AcceptLanguage.Add(new StringWithQualityHeaderValue("en-US"));
        return (_http_client1);
        }
        catch (Exception ex)
        {
        Console.WriteLine(ex.Message);
        }
        return (null);
    }
    
    static async Task Main()
    {
        List<int> token_ids = new List<int>();
        Dictionary<int, string> Dweb_links = new Dictionary<int, string>();
        for (int i = 1; i <= 200; i++)
        {
        string page_url1 = string.Format("https://myweb.com/markets/page={0}", i);
        token_ids.Add(i);
        Dweb_links.Add(i, page_url1);
        }
        List<Task> tasks = new List<Task>();
        HttpClient client1 = Create_HttpClient();
        foreach (int id1 in token_ids)
        {
        bool has_token = Dtokens.TryGetValue(id1, out string token1);
        bool has_web_link = Dweb_links.TryGetValue(id1, out string web_link1);
        if (!has_token && has_web_link)
        {
        Task task = Get_WebContent(id1, client1, web_link1);
        tasks.Add(task);
        }
        }
        Task.WaitAll(tasks.ToArray());
    }
    
    

    Now, I have 2 issues: first one, the code did not work at all from time to time.  Second one, when it works, from time to time, HTML content for one web link is missing; but some times, it works perfectly.  Please advice what I can do to fix the issues.

    Thanks,

    Thursday, September 19, 2019 6:12 PM
  • I think the problem is in your concurrent dictionary logic. Is there any particular reason you're using it? Why can't you just enumerate your URLs and retrieve the data for each one? Are some URLs related to others such that you have to track this token thing?

    Personally I would recommend that you separate the retrieve of the data from the URLs from your post processing. And I'll mention again that if you are going to set the semaphore to 1 then you are no longer running these in parallel and there is no reason to use the semaphore. Just remove that logic altogether.

    private static SemaphoreSlim _semaphore = new SemaphoreSlim(5);
    
    private static async Task<string> Get_WebContent(HttpClient client, string url)
    {
        _semaphore.Wait();
        try
        {
        using (var response = await client.GetAsync(url))
        {
            //Always verify success first...
            //Note that this eliminates the need for you 503 check below, if you need to check status code use the response property instead
            response.EnsureSuccessStatusCode();
    
            var body = await response.Content.ReadAsStringAsync();
    
            var html = html_content.Replace("\n", "");
            var reg_token1 = Regex.Match(page_html, @"<input type=""hidden"" name=""csrftoken""(.*?)>").Value;
            
            return reg_token1.Substring(38);
        };
        }
        finally
        {
        _semaphore.Release();
        };
    }
    
    private static HttpClient Create_HttpClient()
    {
        try
        {
        ServicePointManager.Expect100Continue = false;
        ServicePointManager.SecurityProtocol =
        SecurityProtocolType.Tls12 | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls;
        ServicePointManager.ServerCertificateValidationCallback += (sender, cert, chain, sslPolicyErrors) => true;
        HttpClientHandler clientHandler = new HttpClientHandler()
        {
        AllowAutoRedirect = true,
        AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip,
        };
        HttpClient _http_client1 = new HttpClient(clientHandler);
        _http_client1.DefaultRequestHeaders.Accept.Clear();
        _http_client1.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("*/*"));
        _http_client1.DefaultRequestHeaders.Add("Accept-Encoding", "gzip, deflate");
        _http_client1.DefaultRequestHeaders.AcceptLanguage.Add(new StringWithQualityHeaderValue("en-US"));
        return (_http_client1);
        }
        catch (Exception ex)
        {
        Console.WriteLine(ex.Message);
        }
        return (null);
    }
    
    static IEnumerable<string> GetUrls ( int count )
    {
       for (var index = 1; index <= count; ++index)
          yield return $"https://myweb.com/markets/page={index}");
    }
    
    static async Task Main()
    {
        var tasks = new List<Task>();
    
        var client = Create_HttpClient();
    
        for (var url in GetUrls(200))
        {
           var task = Get_WebContent(client, url);
           tasks.Add(task);
        };
    
        Task.WaitAll(tasks.ToArray());
    
        //Get the tokens
        var tokens = tasks.Select(t => t.Result);
    }


    Michael Taylor http://www.michaeltaylorp3.net

    Thursday, September 19, 2019 8:59 PM
    Moderator
  • Hello:
    Thanks for your code.  The reason for the concurrent dictionary is that I need to save the tokens for each web link, then later on, I can use the token to HTTP post some data to each of the web link.  As each web link can be uniquely identified by its id (int type), so save the tokens in a dictionary seems to be the only option I can take.
    I look at your code, the major issue for me is: how I can tell which token belong to which unique id?
    For example, for the first web link: https://myweb.com/markets/page=1
    If I got the token is: token1="ABC".
    Then how I can save its value ("ABC") for id=1 or https://myweb.com/markets/page=1
    If I don't use a dictionary to save its value?
    Thanks,
    Thursday, September 19, 2019 10:31 PM
  • A simple type solves that problem.

    public class MyData
    {
       public string Url { get; set; }
    
       //Add whatever data you want
       public string Token { get; set; }
    }
    
    private static async Task<MyData> Get_WebContent ( string url )
    {
       var data = new MyData() {
          Url = url
       };
    
       ...
       data.Token = ...;
    
       return data;
    }
    
    static void Main ()
    {
       ...
       
       //Get the results (IEnumerable<MyData>)
       var items = tasks.Select(t => t.Result);
    }
    
    
    
    
    
    
    


    Michael Taylor http://www.michaeltaylorp3.net

    Thursday, September 19, 2019 11:54 PM
    Moderator
  • Hello:
    Thank you very much for your code.
    I changed my code by your code sample, and I did a few more tests.
    When I set semaphore count to 1, it works most of the time.
    If I set semaphore count to 2, I got 20% of the HTML contents, set to 3, 4 or 5, I can only get about 10% to 15% of the HTML contents.
    If I set semaphore count to 10, then I got less than 5% of the HTML contents.
    I don't think there is any issue with the code, but I suspect the web server is not powerful enough to provide HTML contents for multiple HTTP GET request in parallel, that is the reason why I can always use web browser to visit the web links, as using web browser, it gets the job done at one web link a time.
    Actually, even with a web browser, it is not working all the time, I think about 20% of the time, even with web browser, I saw 503 errors.  When I use Chrome to visit the first web page and stay at the first web page, if I reload the page a few times but clicking the refresh icon, one out of 5 times, I saw 503 errors.
    But this web site has just shut down and made some improvements within past 2 weeks, it has not done such improvement for 2 years.  So I guess the new web server should be powerful enough to provide the service, but it seems not so.
    But when using HTTP client without semaphore, I saw 503 error half of the time, even without any parallel request.
    Do you think is there any way that on the server side, they use some ways to control the server not providing answers to multiple http requests, or simply the new web servers are not powerful enough to do the job?
    PS: from my testing, the statement:
    response.EnsureSuccessStatusCode();
    is not working, as with this statement, I always saw 503 errors.
    So I changed it to use this statment:
    if (response.IsSuccessStatusCode == true)
    { ... }
    Thanks,
    Friday, September 20, 2019 9:43 AM
  • You have no control over whether the server can handle the request at the time or not. In general web servers should be able to handle 100s of simultaneous connections so it shouldn't be an issue. 

    Your `IsSuccessStatusCode` doesn't really do anything. `EnsureSuccessStatusCode` is the method you should call. It checks this property and throws an exception if it is not success (2xx). If that call isn't failing then the server is returning a success response. That doesn't make sense for a 503. 503 is a server error and therefore the `Ensure` call will fail. Set a breakpoint on the response and check the status code. If you are getting 503 in the HTML but the status code is returning 200 then the server is incorrectly coded. This would be exceedingly rare.

    For cases where you are working with an unstable server consider a retry policy. For example you could try to connect and if it fails wait a little bit and try again. Making a couple of attempts before considering it failed is generally recommended. You can look into Polly which supports this for HttpClient but it may not play well with your other tasks.

    As for the tasks, you should have no problems running more than one thread at a time. Again, server can handle 100s of calls. If you are getting 503s after more than one connection then the issue might be on the server side blocking simultaneous requests. That would be really odd since we do that all the time normally. Nevertheless if you only want to run 1 task at a time then you don't need all the extra logic in the main anymore. Just use a simple async/await loop.

    static async Task Main()
    {
       var client = Create_HttpClient();
    
       foreach (var url in GetUrls(200))
       {
          var result = await Get_WebContent(client, url);
          //Process the results      
       };
    }


    Michael Taylor http://www.michaeltaylorp3.net

    • Marked as answer by zydjohn Friday, September 20, 2019 8:10 PM
    Friday, September 20, 2019 2:13 PM
    Moderator
  • Hello:
    Thanks for your reply, I will try to do what you said.  But now I have another issue I may need your help.
    After I got the tokens for the web link, I will use the token to do an HTTP Post to the web site for one specific page.
    It is actually an order placing web site, just like buy/sell stocks.
    When I see there are some offers available, then I want to place an order by HTTP Post using the token and application/x-www-form-urlencoded form data, something like this:
    csrftoken=XYZ&side=0&price=1.10&market=123&runner=456&type=1&price_formatted=1.10&amount=1.00
    Whenever I see there are some offers available via web browser, I always got 503 Service Temporary Unavailable error.  Using web browser to submit the order always works (or 99% of the time works without 503 error).
    But whenever there is no offers available via web browser, the HTTP Post request went through, but I got the following Json reply:
    {
      "canceled": true,
      "matching_id": false,
      "amount": 0
    }
    I sent the HTTP Post data by a Rest Client Insomnia 6.6.2, so I can see the server reply and save the reply in the HTTP session.
    I want to know in this case, is my HTTP Post request is correct in format or not, but why when there are some offers available, I can never send HTTP Post request due to 503 error, but only when there is no offers available, my HTTP Post requests went through, yet got canceled reply.
    Thanks,
    Friday, September 20, 2019 6:53 PM
  • I think that is a different question and is probably related to the site cookies. I recommend you post it as a separate question.

    Michael Taylor http://www.michaeltaylorp3.net

    Friday, September 20, 2019 7:18 PM
    Moderator