locked
Multiple Worker Role Instances and one SQL Azure database - is queue-like functionality possible? RRS feed

  • Question

  • Hello,

    I have a single SQL Azure database that contains a table in which each record identifies a web service URI (wsURI), a DateTime record of the next time that web service will need to be polled (NextPollTime), and a polling frequency value (integer, PollingFrequency) representing the number of seconds between desired pollings. In my application I have a Worker Role that calls a Stored Procedure in the SQL Azure database every 10 seconds to return the records where NextPollTime has already occurred (compares to Now). This was prototyped on a standard SQL Server system, and all works fine.

    My question is, if I have multiple instances of that Worker Role, is there any way that I can guarantee that only one of those instances gets data from the SQL Azure Stored Procedure? In other words, I need to use the database table as a type of queue, and I'm hoping that this can be done with some sort of table locking procedure. I am open to methods other than the Stored Procedure I currently use for conducting what is essentially a simple SELECT statement, I just need to make sure that only one of the n worker role instances, which I need in order to get the SLA, is able to get the result-set that initializes a host of other actions.

    Out of concerns of cost and performance, and because I would prefer to avoid building another layer into my application, I would prefer to not use the approach detailed here: http://social.msdn.microsoft.com/Forums/en-US/windowsazuredevelopment/thread/fd49a556-1f4b-4991-a61f-ddde7b6f96d8/ (cost in that I'm already paying for a SQL Azure DB so why not use it for this if I can as I won't incur transaction costs, and performance in that I've read but lost the links to articles explaining that SQL Azure is faster in this case, and speed is imperative in my application).

    Please let me know if there is a way of handling this, or if I can clarify my question in any way.

    Thanks in advance for your assistance.

    -Alex

    Wednesday, February 2, 2011 10:18 PM

Answers

  • Hi Alex,

    It isn't clear why you need to have only "one" worker role perform the tasks. Do you mean that a record should only be processed by one worker role at a time? Or do you truly mean to have the other worker roles in "standby" unless the primary worker fails?

    If you can have multiple worker roles process records then you could implement a timeout feature in your code (say, 1 minute if the tasks are quick to execute). Each worker role would take "n" records every 10 seconds (so no worker processes all the records) and updates a new datetime field at the time the records where taken. This method is not too hard to implement and works well for implementing a load-balancing mechanism.

    If you must have one and only one worker role perform the tasks, then you can do the same thing than above by removing the ROWCOUNT clause. All records would be assigned to one and only one worker role. 

    Here is a possible implementation using a column called SynchCol (not tested... not compiled either):

    DECLARE @time DATETIME
    SET @time = GETDATE()

    SET ROWCOUNT 3  -- limit to 3 records; remove if you want all records processed by the same worker role

    -- Update 3 record with a time stamp if record was released or if the timeout took place
    UPDATE table SET synchCol = @time FROM table WHERE synchCol IS NULL or DATEDIFF(n, synchCol < @time)  > 1

    SET ROWCOUNT 0

    SELECT * FROM table WHERE synchCol = @time  -- return my assigned records

    Then... when the process is finished in the worker role, you can either let the timeout expire, or set the records' synchCol's value to NULL.  Since the rows have the @time variable set in their synchCol column, you could execute something like this, where @time is the column's value of any record fetched:

    UPDATE table SET synchCol = NULL WHERE synchCol = @time

    Hope this helps.

     


    Herve Roggero, Blue Syntax MVP SQL Azure Co-Author: Pro SQL Azure
    • Marked as answer by abedig Sunday, February 13, 2011 11:12 PM
    Monday, February 7, 2011 3:49 PM
  • First, let me start by saying I am not a database developer.  The following works for us very well, but someone much better at database development than I am (which doesn't take much) coded this up.

    We do something like this at a client of mine, though the implementation currently isn't in Azure, but the approach should still work.  There are multiple services that all look at a SQL Server table that we are using as a type of queue.  The stored procedure (which I'll refer to as the pop procedure) basically works on the concept of a Row lock.  It asks for the next row it needs to process and the procedure updates a flag on the table indicating which processor has taken the work.  The pop procedure does an update within a transaction using the WITH (ROWLOCK, XLOCK) hint which basically gets a row level, exclusive lock on the row for the update.  This allows for one, and only one, execution of the procedure to get a row (wich is what you are looking for I think).

    You're probably better off asking this question to a SQL Server forum or in the SQL Azure forum, but essentially a solution that locks the rows exclusively in SQL should get you what you are looking for.  There are repercussions of using hints, so make sure to talk to a database developer about them (kind of like saying "talk to your tax attorney" about that great deduction you heard someone talking about).

    Hope this helps.


    - Mike Wood http://mvwood.com | http://nplus1.org | http://cinnug.org
    • Marked as answer by abedig Sunday, February 13, 2011 11:12 PM
    Friday, February 11, 2011 12:42 PM

All replies

  • Hi Alex,

    It isn't clear why you need to have only "one" worker role perform the tasks. Do you mean that a record should only be processed by one worker role at a time? Or do you truly mean to have the other worker roles in "standby" unless the primary worker fails?

    If you can have multiple worker roles process records then you could implement a timeout feature in your code (say, 1 minute if the tasks are quick to execute). Each worker role would take "n" records every 10 seconds (so no worker processes all the records) and updates a new datetime field at the time the records where taken. This method is not too hard to implement and works well for implementing a load-balancing mechanism.

    If you must have one and only one worker role perform the tasks, then you can do the same thing than above by removing the ROWCOUNT clause. All records would be assigned to one and only one worker role. 

    Here is a possible implementation using a column called SynchCol (not tested... not compiled either):

    DECLARE @time DATETIME
    SET @time = GETDATE()

    SET ROWCOUNT 3  -- limit to 3 records; remove if you want all records processed by the same worker role

    -- Update 3 record with a time stamp if record was released or if the timeout took place
    UPDATE table SET synchCol = @time FROM table WHERE synchCol IS NULL or DATEDIFF(n, synchCol < @time)  > 1

    SET ROWCOUNT 0

    SELECT * FROM table WHERE synchCol = @time  -- return my assigned records

    Then... when the process is finished in the worker role, you can either let the timeout expire, or set the records' synchCol's value to NULL.  Since the rows have the @time variable set in their synchCol column, you could execute something like this, where @time is the column's value of any record fetched:

    UPDATE table SET synchCol = NULL WHERE synchCol = @time

    Hope this helps.

     


    Herve Roggero, Blue Syntax MVP SQL Azure Co-Author: Pro SQL Azure
    • Marked as answer by abedig Sunday, February 13, 2011 11:12 PM
    Monday, February 7, 2011 3:49 PM
  • There isn’t a whole lot of locking capability available to you in the Windows Azure platform; the cloud computing way of doing things leans toward optimistic concurrency rather than holding resources hostage by locking them. To quote Steve Marx, “nothing distributed is atomic.” However, sometimes you do need something like a lock. The best way I have found to “lock” something in Windows Azure is to create a blob in Windows Azure storage as a lock resource and use the Blob Lease feature to set, release, and check the lock. Blob leases are pretty coarse (1 minute at a time, but renewable and releasable) so this isn't appropriate for everything.


    David Pallmann GM Application Development, Neudesic Windows Azure MVP
    Monday, February 7, 2011 3:55 PM
  • First, let me start by saying I am not a database developer.  The following works for us very well, but someone much better at database development than I am (which doesn't take much) coded this up.

    We do something like this at a client of mine, though the implementation currently isn't in Azure, but the approach should still work.  There are multiple services that all look at a SQL Server table that we are using as a type of queue.  The stored procedure (which I'll refer to as the pop procedure) basically works on the concept of a Row lock.  It asks for the next row it needs to process and the procedure updates a flag on the table indicating which processor has taken the work.  The pop procedure does an update within a transaction using the WITH (ROWLOCK, XLOCK) hint which basically gets a row level, exclusive lock on the row for the update.  This allows for one, and only one, execution of the procedure to get a row (wich is what you are looking for I think).

    You're probably better off asking this question to a SQL Server forum or in the SQL Azure forum, but essentially a solution that locks the rows exclusively in SQL should get you what you are looking for.  There are repercussions of using hints, so make sure to talk to a database developer about them (kind of like saying "talk to your tax attorney" about that great deduction you heard someone talking about).

    Hope this helps.


    - Mike Wood http://mvwood.com | http://nplus1.org | http://cinnug.org
    • Marked as answer by abedig Sunday, February 13, 2011 11:12 PM
    Friday, February 11, 2011 12:42 PM
  • Thank you for your response. Your suggestion of Rowlocks as in SQL Server seems to be a possible solution. However, having explored this question more thoroughly, I now see the possibility for using the Azure ServiceBus to achieve better performance in my application by configuring the instances to fire sequentially, thereby guaranteeing (if I space the instances sufficiently and implement a timeout on the query) that I do not have two instances executing the sp_GetReadyRecords procedure simultaneously. If I end up failing with that, I will be back to Rowlocking.

     

    Thanks again,

    Alex

    Sunday, February 13, 2011 11:04 PM
  • Thanks for your response Herve, 

    Your question is a good one - the most important functionality I am trying to achieve is making sure that only one instance processes each record when its NextPollTime has expired, and that the processing begins as close to this time as possible - what the other worker role is doing meanwhile is less important. I have decided to pursue using the ServiceBus to coordinate the efforts of the worker roles, splitting their execution times over time.

    Sunday, February 13, 2011 11:11 PM
  • Thanks David. I think that suggestion might work, however the coarseness of the Blob lease does not seem like it will be sufficient for my application. Thanks for the suggestion though - your reminder that "nothing distributed is atomic" is one of several reasons I am re-examining my approach here.
    Sunday, February 13, 2011 11:14 PM
  • Good luck with your solution.  I may not completely grasp your plan, but one caution I would give you on using the approach you are looking at is that it all depends on timing.  As the number of instances increases the timing aspect will become much more critical.  Eventually, you'll have an overlap and two instances will run the procedure at the same time. 

    In fact, it won't actually take more instances to show the issue.  If you rely solely on timing you could run into problems when one of the instances has a hardware failure, or is moved by the fabric and spun up somewhere else.  The timings could then get thrown off. 

    I'd highly recommend using something, either the BLOB lease option, rowlocking, or some other way of handling a critical section across multiple processes/instances in order to ensure you get your one execution at a time goal.  If you leave it up to only timing, eventually, over time you'll get bit.


    - Mike Wood http://mvwood.com | http://nplus1.org | http://cinnug.org
    Monday, February 14, 2011 12:32 AM
  • While you may be right, I hope to use inter-role instance communication via the service bus, along with the ServiceBusRelayMode.Eventing option, to have my role instances of each type adjust their timing during Role Recycle events. Given the fact that I am using non-blocking timers to control the action, I think I can see a way to make this work. 
    Tuesday, February 15, 2011 10:56 PM