none
Throttling errors from Queue under low load (http status code 503)

    Question

  • Hi,

    Starting about 14 days ago (ultimo January 2016), we're getting frequent exceptions from the CloudQueue.GetMessage .NET API. Exception is Microsoft.WindowsAzure.Storage.StorageException, message is "The remote server returned an error: (503) Server Unavailable.". The Queue log contains entries with the following details: "GetMessage;ThrottlingError;503".

    This happens several times a day (50 times per day or more). The application has been running for around 3 months prior to this, without any throttling related issues.

    The queue is being polled from a WebJob (2 instances, continuous mode). The polling frequency is 1 second (per instance), so on average, we are issuing a GetMessage request every 500 milliseconds. According to the docs, that is 3 orders of magnitude below the maximum number of transactions per second.

    So we're a bit at a loss here - we have tried experimenting a bit with the polling frequency, but that does not change the behaviour. 

    Could anyone help shed some light on this?

    Best,

    Mikkel


    Wednesday, February 10, 2016 7:13 AM

All replies

  • Hi,

    Thanks for posting here.

    I suggest you to check the troubleshooting guide for 503 server Unavailable for more details.

    http://blogs.msdn.com/b/benjaminperkins/archive/2013/03/01/some-tips-for-troubleshooting-503-http-status-codes.aspx

    Girish Prajwal

    Wednesday, February 10, 2016 5:32 PM
    Moderator
  • Hi,

    Thanks for the reply - however, I can't see how the situation described in the linked article applies to 503 errors returned by the Azure Queue service?

    Best,

    Mikkel

    Wednesday, February 10, 2016 6:22 PM
  • Hi Mikkel,

    My sincere apologies for in the incorrect link posted.

    There are two solutions to this problem. The first, which is completely unrealistic, is to turn off TCP delayed acknowledgments in Azure.  The second is much easier. Disable Nagle’s Algorithm in the call to GetMessage. In Azure, Nagle is enabled by default. To turn it off, you need to use the ServicePointManager .NET class.

    CloudStorageAccount account = CloudStorageAccount.Parse(connectionString);     
    ServicePoint queueServicePoint =     
      
    ServicePointManager.FindServicePoint(account.QueueEndpoint); queueServicePoint.UseNagleAlgorithm = false;

    Refer: http://blogs.msdn.com/b/windowsazurestorage/archive/2010/06/25/nagle-s-algorithm-is-not-friendly-towards-small-requests.aspx

    Hope this helps.

    Girish Prajwal

    Wednesday, February 10, 2016 6:54 PM
    Moderator
  • Hi Girish,

    No problem :)

    About Nagle: The errors we are seeing occur during HTTP GET's (CloudQueue.GetMessage()). I understand (at least, I think I do) how Nagl'ing adds latency to PUT/POST requests, but I was under the impression that it has no effect on HTTP GET's (as there is no payload)?

    Best,

    Mikkel

    Wednesday, February 10, 2016 8:05 PM
  • Hi,

    Starting about 14 days ago (ultimo January 2016), we're getting frequent exceptions from the CloudQueue.GetMessage .NET API. Exception is Microsoft.WindowsAzure.Storage.StorageException, message is "The remote server returned an error: (503) Server Unavailable.". The Queue log contains entries with the following details: "GetMessage;ThrottlingError;503".

    This happens several times a day (50 times per day or more). The application has been running for around 3 months prior to this, without any throttling related issues.

    The queue is being polled from a WebJob (2 instances, continuous mode). The polling frequency is 1 second (per instance), so on average, we are issuing a GetMessage request every 500 milliseconds. According to the docs, that is 3 orders of magnitude below the maximum number of transactions per second.

    So we're a bit at a loss here - we have tried experimenting a bit with the polling frequency, but that does not change the behaviour. 

    Could anyone help shed some light on this?

    Best,

    Mikkel


    While not scientifically scrutinised, the Azure diagnostics charts (can't add image here, forum blocks the upload) seems to indicate a correlation between the occurrences of errors we are seeing and the jitter in the queue response time. I'm not sure that is useful info (and I realise that correlation is not causation), but I'm hoping it won't exactly hurt either...

    Wednesday, February 10, 2016 9:12 PM