locked
Multiple instances of worker roles RRS feed

  • Question

  • Hello,

     

    I'm working on an Azure project where a worker role will have the task of periodically (at a configurable interval) access a web service on an external application.
    This works OK if I only have one instance of my worker role. But if i want to increase the number of worker roles, then each instance will access the external web service, but I only want this task to be perfomed once at each interval, not by each instance.
    The reason why I would want several instances, is that the same worker role is also responsible for serving a WCF application that needs load balancing, and I will also need several instances to get the SLA promised by MS.

    Does anyone have any good ideas on how I can syncronize this? I have read some posts suggesting that I can use the message queue for this, but the thing is that I have no way of placing the messages on the queue, unless I create a second worker role that runs a single instance, with the only mission of placing messages on the queue. But then I will get extra compute charges for a very small role...

    Another issue with this also, is the problem with staging and production environment. I only want the periodically polling tasks to be performed in the production environment, and not in the staging environment. How can I tell if the worker role is running in staging?

    Thanks,
    Tor-Odd

     


    Tor-Odd Connelly
    Wednesday, November 17, 2010 2:33 PM

Answers

  • You could have a blob that stores a timestamp with the time that the web service was last accessed. When the scheduled task triggers, your worker role instance can read the blob's contents to determine if the service has already been accessed for the current interval.

    If it hasn't, the role instance can attempt to acquire a lease on this blob and if it succeeds, it can access the web service, update the timestamp and release the lease on the blob.

    If the role fails to acquire the lease, it means that another instance is already accessing the web service. For robustness, in case the other instance crashes or cannot access the service, it can keep retrying until it can acquire the lease and confirm that the service has already been called.

    • Marked as answer by Tor-Odd Friday, November 19, 2010 7:41 AM
    Wednesday, November 17, 2010 3:20 PM

All replies

  • You could have a blob that stores a timestamp with the time that the web service was last accessed. When the scheduled task triggers, your worker role instance can read the blob's contents to determine if the service has already been accessed for the current interval.

    If it hasn't, the role instance can attempt to acquire a lease on this blob and if it succeeds, it can access the web service, update the timestamp and release the lease on the blob.

    If the role fails to acquire the lease, it means that another instance is already accessing the web service. For robustness, in case the other instance crashes or cannot access the service, it can keep retrying until it can acquire the lease and confirm that the service has already been called.

    • Marked as answer by Tor-Odd Friday, November 19, 2010 7:41 AM
    Wednesday, November 17, 2010 3:20 PM
  • Hi,

    First of all you don't need to create another worker role to insert message in a Message Queue. Since you have an external application already (that serves the Web SErvice you are connecting to), you can use that application to insert message in the Azure Message Queue. Windows Azure storage services can be used completely without a compute roles. You can access Windows Azure storage services (that includes Azure Queues) from anywhere in the world, of course with an internet connection. And you definetelly don't need to put a timestamp in a Blob. So, you create your external app, that puts message in an Azure Queue, that will instruct your worker role to do the work. You setup your worker role to read queue at regular intervals. The first instance to get the message, that instance will process the task. Once you get the message out of the queue it will not longer be visible, so the next instance to read the queue it will be empty. And you are quite OK with that approach.

    However the second requirement - only process in production and not in Staging - you can only solve this by configuration file. The service configuration file is something that may be changed while your service is running. So you put a setting in your configuration file to flag whether or not your worker should do processing. The default value will be not. Once you promote your staging to production, you just change the configuration setting in the config file. Your role will recycle automatically (if you haven't change the original WorkerRole template) and your new setting will be read. Than your worker role will start processing the task.

    Hope this helps!

    Wednesday, November 17, 2010 5:14 PM
  • If I understand the problem correctly, the objective is to guarantee that only one of the worker role instances calls the web service at regular intervals while preventing the remaining instances from doing so. While reading messages from a queue solves the concurrency issue by allowing only one of the instances to retrieve a message, it also means that you’ve introduced a second actor into the system whose only purpose is to periodically post a message to the queue.

    One of the possibilities that you mention is using the system hosting the web service do this, but I don’t think that the original poster ever mentioned that this system is under his control. And while the requirements for the message dispatcher are not very demanding—I’m sure that you could use your old Pentium computer for this—it also means that it is a single point of failure.

    If the message dispatcher crashes or loses connectivity, then the web service doesn’t get called at all. Since the original poster is using more than one instance of his worker role to guarantee the SLA, I assume that he also cares that the web service gets called every time. Guaranteeing the message dispatcher availability, even if its resource demands are not very high, means having to host the message dispatcher in more than one computer, provide uninterrupted power, and ensure that you have redundant Internet connectivity. So you might as well use two worker roles hosted in Azure for what is essentially a timer.

    “And you definitely don't need to put a timestamp in a Blob.”

    I’m not sure why you believe that this is the wrong thing to do. A worker role is perfectly capable of triggering a task every N minutes, so it can handle the task of periodically polling the blob. It doesn’t need a second system to tell it when to access the web service. The only remaining problem that needs to be considered is that of concurrency, which I believe is handled adequately by using the blob lease mechanism. By having each role handle its own timing, the only requirement is that you have at least one role running at all times to guarantee that the web service will be called, without the need for any external dependencies, which in my opinion, just make the system more fragile.

    Thursday, November 18, 2010 12:40 AM
  • “And you definitely don't need to put a timestamp in a Blob.”

    I’m not sure why you believe that this is the wrong thing to do. A worker role is perfectly capable of triggering a task every N minutes, so it can handle the task of periodically polling the blob. It doesn’t need a second system to tell it when to access the web service. The only remaining problem that needs to be considered is that of concurrency, which I believe is handled adequately by using the blob lease mechanism. By having each role handle its own timing, the only requirement is that you have at least one role running at all times to guarantee that the web service will be called, without the need for any external dependencies, which in my opinion, just make the system more fragile.

    Hello again,

    I may have misunderstand the original requirement. I thought that the worker role does not know when to process the Service call. Because there are cases when you only want to process something that really needs to be processed. For executing a task evey fixed interval of time you can check out this thread:

    http://social.msdn.microsoft.com/forums/en-us/windowsazure/thread/9ED06FF2-4ED9-4369-B814-AA3500E3C4EC

    Now for the syncing. I don't know why you want to use a BLOB to sync role instances, while the Queue is Perfect for that. Blob is what its abbreviation stands for (Binary Large Object). I am not saying you can't do it with BLOB. I am saying it is more efficient to use Queue. 

    So, back to the issue. You make the role instances work with timers (as in this thread ). Then once the timer is Elapsed, check the queue, as you would check the blob. If no mesages are in the queue - put a mesage "hey I got it, I am wokring on it". If the message is in the queue, just put it back and don't work. You can really only check if there are any messages in the queue.

    There is no difference whether you use Blob or Queue. You can even use a Table (Azure Table) - TimeStamp and Role Instance ID columns. Just add records to it. Do not bother to block, lease or delete anything. Just before you begin processing check the last record, if it is timestamped in the last 2 minutes - just do not do anything. If it is older - make a new one and begin working.

    The issue with the BLOB would be - who is going to make that blob initially. There is another issue with the lease. The lease lasts for one minute. What happens if that external web service call lasts for more than one minute? Are you going to start a timer when you lease the blob to renew the lease every 1 minute until the service call ends? And in that time all other instances will be trying to get the blob. They will be actually trying to gain a lease. So once the original worker finishes and release the lease, the next one will get the lease (while the third one, if exists will still be trying to get the lease). He will get the lease, get the blob, check that is processed just a few secods ago and will release the lease. And so the next. And by the way, is you end up with the blob, you will have to make convertion from System.DateTime to a byte array. Because blob stores only bytes. When you get the blob you essentially get a byte array. You will have to convert between System.DateTime and byte array. Which is actually first conver the DateTtime to a string, then convert it to byte array. Once you get the byte array you convert it to string, than use DateTime.Parse to turn it back to DateTime instance. While having a table (strongly typed) you can just put (or update a single record) a record and just read the table and check the TimeStamp field. And please note that I suggest you to create your own field of tyep dateTime and not rely on the original TimeStamp field of the table entity (http://msdn.microsoft.com/en-us/library/dd179338.aspx).

    At the end - the Azure Table sems to me the most robust way to sync role instances.

    Hope this helps more, than confuses more ;)

    Thursday, November 18, 2010 7:36 AM
  • Hello,

    Thank you all for being helpful with this challenge.

    First of all, I would like to clarify the case. It is correct that the external system should not have any knowlegde of the inner workings of my system, and it's also correct that I need to call the external web service at regular intervals. However, if I "miss" one occasional call, that it not a big problem, but I need to be sure that the whole process never stops. Hence, I cannot rely on any other systems to place messages on a queue on my system.

    Anton: I agree with you that it should not be necessary to use blob leases to solve this. However, your approach on using a message queue, will that not introduce another problem with synchronizing access to the queue. As far I as know, it doesn't exist any queue methods like "AddIfNotExists" that executes in one atomic operation. So how can I be certain that not two instances places the "I'm working" message on the queue at the same time? Besides this, I think your approach looks like the correct way to go.

    Fernando: I've tried to use the blob lease approach, but I'm having problems making sure that the initial blob get created by only one instance. This I can of course handle by initially running the role with only one instance, and afterwards increase to more instances. But I'm struggling with getting the lease methods to work as they're supposed to... I will need to investigate this some more.

    Anyway, I'm still not sure what the correct approach to this is. To me it seems that some sort of a scheduling component should be added to the Azure platform.

     


    Tor-Odd Connelly
    Thursday, November 18, 2010 8:23 AM
  • But I'm struggling with getting the lease methods to work as they're supposed to...

    Note that there is a bug whereby the Storage Service incorrectly reports a blob as locked if it has been leased and the lease has expired.  You are still able to lease the blob in this case.It works correctly if the lease has been released - I don't know what happens if the lease has been broken.

    Thursday, November 18, 2010 4:34 PM
    Answerer
  • Now for the syncing. I don't know why you want to use a BLOB to sync role instances, while the Queue is Perfect for that. Blob is what its abbreviation stands for (Binary Large Object). I am not saying you can't do it with BLOB. I am saying it is more efficient to use Queue. 


    Thank you. I know what the blob acronym stands for, although I don’t think that you should put too much weight on the name. Nowadays, the term blob is used much more loosely and is not limited to large binary objects. Or else, where do you put your small binary objects then? :)

    I’ve already explained why I suggested using blobs to synchronize the role instances and why I thought that queues were not so suitable in this case. It requires you to add another host, and another failure point, merely to post items into the queue, and nothing else really. Moreover, the messages are not the result of some external process, which would justify obtaining this information from the outside, but instead a periodic sequence of messages with no meaning other than triggering an event. A worker role is perfectly capable of generating its own trigger internally.

    I insist that the only real problem is synchronization. Why a blob lease? If you needed to synchronize within the role, you would use a monitor or a mutex. I imagine that you wouldn’t find that so out of place. But if you need synchronization between role instances, then you need its distributed equivalent, which is what a blob lease gives you. Think of it as a distributed lock. 

    So, back to the issue. You make the role instances work with timers (as in this thread ). Then once the timer is Elapsed, check the queue, as you would check the blob. If no mesages are in the queue - put a mesage "hey I got it, I am wokring on it". If the message is in the queue, just put it back and don't work. You can really only check if there are any messages in the queue.

    This is prone to race conditions. One role instance checks whether there are messages in the queue, and finds none. Simultaneously, another role instance is doing the same check and also finds that there are no messages in the queue. Now they both put a message in the queue and they both call the web service. Also, from the description, it appears that everyone is putting messages in the queue and they are never actually taken out. If the queue is empty, you put a message. If you find a message in the queue, you put it back...

    There is no difference whether you use Blob or Queue. You can even use a Table (Azure Table) - TimeStamp and Role Instance ID columns. Just add records to it. Do not bother to block, lease or delete anything. Just before you begin processing check the last record, if it is timestamped in the last 2 minutes - just do not do anything. If it is older - make a new one and begin working.

    Yes, I agree that you could use the timestamp on the blob to determine when it was last updated. In that case, the contents of the blob are irrelevant. I would not, however, use a table and add a new row every time; otherwise, you would need to periodically take care of purging the table or it would grow without control. Unless you need some kind of audit log, I don't see the benefit of this approach.

    The issue with the BLOB would be - who is going to make that blob initially.

    I don’t see this as a problem. Anyone can make the blob. It doesn’t even have to be a role instance that creates it. You could do it when you provision your application. If this is a problem, give me your storage key and I’ll create it for you now :). Once the blob is created, you only write to it. You never delete it.

    There is another issue with the lease. The lease lasts for one minute. What happens if that external web service call lasts for more than one minute? Are you going to start a timer when you lease the blob to renew the lease every 1 minute until the service call ends? And in that time all other instances will be trying to get the blob. They will be actually trying to gain a lease. So once the original worker finishes and release the lease, the next one will get the lease (while the third one, if exists will still be trying to get the lease). He will get the lease, get the blob, check that is processed just a few secods ago and will release the lease. And so the next.

    This happens only once for every interval and remember that a role needs to take the lease only if it finds that the blob timestamp is unchanged. Unless the interval is *extremely* short, this shouldn’t be a problem. 

    And by the way, is you end up with the blob, you will have to make convertion from System.DateTime to a byte array. Because blob stores only bytes. When you get the blob you essentially get a byte array. You will have to convert between System.DateTime and byte array. Which is actually first conver the DateTtime to a string, then convert it to byte array. Once you get the byte array you convert it to string, than use DateTime.Parse to turn it back to DateTime instance. While having a table (strongly typed) you can just put (or update a single record) a record and just read the table and check the TimeStamp field. And please note that I suggest you to create your own field of tyep dateTime and not rely on the original TimeStamp field of the table entity (http://msdn.microsoft.com/en-us/library/dd179338.aspx).

    I don’t know. I find that converting between data types is not as challenging as you make it sound. In any case, the timestamp is really a mechanism to detect whether the blob was modified since you last read it. You could probably get away with storing a long integer that is incremented every time a role accesses the web service. Roles can compare the last value they read with the current value to determine whether the job is pending.

    It's seems that we are not going to agree very much on this subject. Let's hope that at least we've given Tor-Odd some ideas that will help him find a solution to his problem.

     

    • Edited by Fernando Tubio Friday, November 19, 2010 5:12 AM Fixed typo
    Friday, November 19, 2010 5:07 AM
  • But I'm struggling with getting the lease methods to work as they're supposed to...

    Note that there is a bug whereby the Storage Service incorrectly reports a blob as locked if it has been leased and the lease has expired.  You are still able to lease the blob in this case.It works correctly if the lease has been released - I don't know what happens if the lease has been broken.


    Yes, I've read that this is the case with the developement storage, but it this also tha case for the production storage?

    http://social.msdn.microsoft.com/Forums/en/windowsazuredata/thread/9ae25614-b1da-43ab-abca-644abc034eb3

    On the above thread it's stated that this is not a problem in production storage. Which is correct?


    Tor-Odd Connelly
    Friday, November 19, 2010 7:26 AM
  • The final solution seems to be to use a blob which contains a timestamp for when the next web service call should be made. In my worker role, I have a loop that periodically tries to aquire a lease on the blob, and if it succeeds, it checks if the time has come to do the scheduled processing. If so, I then create a new worker thread that do the actual work (because it might take more than 1 minute, which is the lease expiry time). While the worker thread is doing it's work, I make sure to renew the lease every 50 seconds. When my worker thread is finished, I update the blob with the timestamp for when the next processing should be done, and finally releases the lease.

    Thanks again.


    Tor-Odd Connelly
    Friday, November 19, 2010 7:40 AM
  • While the worker thread is doing it's work, I make sure to renew the lease every 50 seconds.

    What happens if the worker instance crashes?

    Another option would be a queue with a single message. All instances poll this queue and try to get the message with a visibility timeout close to the desired periodicity - assuming it is under two hours. The winning instance does the process. It does nothing with this message and the visibility timeout is allowed to expire allowing the whole process to start again with instances competing to get the message. I believe this works if the periodicity is under two hours - the maximum value for visibility timeout.

    This has the advantage that everything continues to work even if the worker instance crashes because it does not require the winning instance to do anything at all to the message. Of course, there is still the issue of handling worker instance failure from an application functionality perspective. Essentially this solution implicitly updates the next timestamp through the visibility timout rather than explicitly updating it in the blob.

    Friday, November 19, 2010 8:21 AM
    Answerer