locked
Azure Table used as a Queue - will this work? RRS feed

  • Question

  • I'm working on a subsystem that requires retry logic. Any given operation may require a retry to be scheduled to run at some point in the future. These may be anywhere from 1 minute to several hours in the future. I don't think I can use an Azure Queue for this, as there is no particular ordering to the times - eg, I might enqueue 50 retries for two hours in the future, and then a retry for one minute in the future. In that case, the one-minute retry would be "hung up" behind the 50 retries that don't happen for two hours.

    I could use SQL Azure for this, but I'm not currently using it. I don't want to introduce another major component for something so minor.

    So, I'm thinking I'll use an Azure Table as a quasi-queue instead. I'm hoping for a little architecture review - if anyone sees a problem with this, please let me know before I waste too much time. :)

    I figure I'll use a constant PartitionKey. The RowKey will be the Ticks for when the retry should be executing - ie, RetryAtTime.Ticks. That way I can easily get the next items that need to execute. The big problem is concurrency - making sure only one role instance executes a given operation. My plan is to add a LockedUntilDate column and leverage the optimistic concurrency mechanism. The process to get operations to retry would look like this:

    1) Get the recent entities from the table that might need to be updated. ie (DateTime.UtcNow > RetryAtTime && LockedUntilDate < DateTime.UtcNow)

    2) For eligible items, set LockedUntilDate = DateTime.UtcNow.AddMinutes(5).

    3) Update all the eligible items in a batch. If another role instance has updated those items, the update will fail and this role instance just goes back to sleep. If the update succeeds, then this role instance can work those items.

    4) Process items - delete or reschedule items as necessary.

    There's obviously a little slop in the system. In step #3, if just one of the items in the batch fails the whole batch will fail, and even eligible items won't get worked on. But in this case I'm willing to live with that.

     

    So what do you think? Is this feasible?

    Tuesday, August 30, 2011 1:42 AM

Answers

  • Why use a batch for #3? (Why not take them one at a time?)

    The only theoretical issue I can see with this is that your clocks may be out of sync, so it's possible for one instance to think a lease has expired while another instance thinks it still has exclusive access. This doesn't seem like a big deal to me, since clocks are fairly synchronized and you can always set the LockedUntilDate sufficiently far in advance so you don't have to worry about it. (Make sure you have ample time even if another clock is off by a few seconds.)

    • Marked as answer by Wenchao Zeng Tuesday, September 6, 2011 7:33 AM
    Tuesday, August 30, 2011 3:45 AM

All replies

  • Why use a batch for #3? (Why not take them one at a time?)

    The only theoretical issue I can see with this is that your clocks may be out of sync, so it's possible for one instance to think a lease has expired while another instance thinks it still has exclusive access. This doesn't seem like a big deal to me, since clocks are fairly synchronized and you can always set the LockedUntilDate sufficiently far in advance so you don't have to worry about it. (Make sure you have ample time even if another clock is off by a few seconds.)

    • Marked as answer by Wenchao Zeng Tuesday, September 6, 2011 7:33 AM
    Tuesday, August 30, 2011 3:45 AM
  •  

    Brian Reischl,

    Based on the Solution given by Steve Marx, the only disadvantage when you read rows in single instead of bulk rows is that number of transaction would be more and if your table operation is huge and many; it may cost little high.

    Rather an alternative solution is that you can read data in table based on your timings and then feed it to queue to control and coordinate concurrency.

    Naveen Kumar V

    Tuesday, August 30, 2011 1:06 PM
  • Thanks for the replies, glad to know I didn't miss something glaringly obvious.

    @Steve - the reason for the batch in #3 is to reduce the number of transactions. I figured that saves time and money. For my current volume I don't expect many conflicts, so it's a win. But single updates might be worth doing if you had lots of workers.

     

    Tuesday, August 30, 2011 1:58 PM