What is the optimal retry wait time? RRS feed

  • Question

  • Hi,

    According to the documentation the explanation for an HTTP 503 is the following:


    This error means that the system is under heavy load and your request can't be processed at this time.


    In this case, we highly recommend that your client code back off and wait before retrying. This will give the system some time to recover, increasing the chances that future requests will succeed. Rapidly retrying your requests will only prolong the situation.


    All very well, but how long should the "client code back off and wait before retrying"?

    One second? Two seconds? Five minutes?

    I would add that it's just not feasible to wait more than a couple of seconds. The end-user cannot be kept waiting.


    Tuesday, December 2, 2014 7:09 PM


  • Hi Noel,

    You will see this most commonly when you are doing an extremely large initial uploads (or an extremely large update) of documents in an index.  In many cases customers will parallelize the upload of their batches, which is a great idea, however, it also has the possibility of backing up your search service as it tries to index all of this data.  In this case you might start to see these 503's. 

    When we talk about backing off, we typically suggest an exponential backoff model where you retry after 1 second, and if that fails, try again in 2 seconds, and then 4 seconds, etc until it is successful. 

    Please note this is rarely due to user queries timing out.  Certainly if you have more queries then your search unit(s) can handle you would start to see queries become more latent, but an easy solution to that is to increase your replica count. 

    Please also note, that adding partitions has the advantage of increasing your data ingestion rate. 

    I hope that helps.


    Sr. Program Manager, SQL Azure Strategy - Blog

    Tuesday, December 2, 2014 11:40 PM