locked
WorkerRole per process RRS feed

  • Question

  • Hi

    It is good to have for each process single WorkerRole?

    Example: I send email over queue, just insert email structure to queue and workerrole watchs the queue, when it finds new record in queue, transforms it to really email and sends.

    Is good to have one workerrole for this process?

    I have more queues in my solution and I dont know what is best solution. To have one workerrole to handles all queues or one role to each queue?

    (Its not only queues, but i use queus a lot, its nice feature)

    Thanks
    Pavel

    Monday, June 18, 2012 11:49 AM

Answers

  • but are valid approaches, each having some limitations though. If you have a worker role for each functionality, you can scale or upgrade them independently, and have a good level of isolation. On the opposite if you combine multiple functinalities into a worker role, you may reach a much better utilization, in particular for periods of low activity, but may need bit more work in the worker role to have a stable implementation. I've seen both ways implemented and working quite well, but depends on the usage pattern and the functionality in general which is better
    • Marked as answer by Orik007 Monday, June 18, 2012 7:59 PM
    Monday, June 18, 2012 2:13 PM
  • Hi Pavel -

    Just wanted to agree with the other response.  The two big factors here are isolation and cost.  There's also one more of latency, which also plays into isolation.

    1) If isolation is important (because you need to scale or upgrade one process far more often than another), then that would lean toward having separate workers for those roles that need that additional isolation.  It keeps the worker roles simple, allows you to upgrade just one of them, and to scale one out while leaving the others alone.  Regarding latency, if you need the workers to process work items within a certain amount of time, then combining the worker roles puts that at risk.  If the worker's resources are tied up working on one long running operation from one queue, it will be slower to pick up a message from another queue and process that.  You can improve this by scaling out, but even then, you may need to be careful to make sure that all of your worker roles aren't working on one type of message, while ignoring others.  Of course, if you don't care about the amount of time items spend in the queues, or if that's really flexible, then this might not matter to you.  Additionally, you can implement patterns of queue access that would limit the chance of this happening.  This includes being able to Update the queue message with a status indicator.  A worker doing this could decide that it will spend at most 30 seconds on one message, then save its current processing status into the queue message, and then go get a different message from a different queue.  The message they saved will become visible after its visibility timeout, and another worker will be able to process it beginning where the first one left off.

    2) However, this may add cost.  If you have 10 processes, each of which consumes very little resources, and very infrequently (say each of them processes 5 queue messages per day, each of which takes 5 seconds of CPU time), this would mean that the CPU ability of one box is likely capable of processing all of the messages quite quickly.  For uptime considerations, the usual recommendation is to run 2 instances of each role.  If you did a worker for each process, that might be 2 instances for each of 10 workers, meaning 20 total instances.  If you combine them, you could just run 2 instances.  You might choose to use a larger VM size to get more horsepower in the combined case if you need it.

    To summarize, it's really a balance, and you may end up deciding on some hybrid, where a single role that's expected to be updated more frequently or scaled more broadly is built as its own worker role, while the rest are combined.  Both solutions will work, it's really just a matter of what priorities you have.

    Glad you like queues!  Hopefully you also saw that we've recently lowered the price of storage transactions, so your queue transactions are now even less expensive than they were before!  See here for more information: http://blogs.msdn.com/b/windowsazurestorage/archive/2012/06/08/10x-price-reduction-for-windows-azure-storage-transactions.aspx


    -Jeff

    • Marked as answer by Orik007 Monday, June 18, 2012 7:59 PM
    Monday, June 18, 2012 4:44 PM

All replies

  • but are valid approaches, each having some limitations though. If you have a worker role for each functionality, you can scale or upgrade them independently, and have a good level of isolation. On the opposite if you combine multiple functinalities into a worker role, you may reach a much better utilization, in particular for periods of low activity, but may need bit more work in the worker role to have a stable implementation. I've seen both ways implemented and working quite well, but depends on the usage pattern and the functionality in general which is better
    • Marked as answer by Orik007 Monday, June 18, 2012 7:59 PM
    Monday, June 18, 2012 2:13 PM
  • Hi Pavel -

    Just wanted to agree with the other response.  The two big factors here are isolation and cost.  There's also one more of latency, which also plays into isolation.

    1) If isolation is important (because you need to scale or upgrade one process far more often than another), then that would lean toward having separate workers for those roles that need that additional isolation.  It keeps the worker roles simple, allows you to upgrade just one of them, and to scale one out while leaving the others alone.  Regarding latency, if you need the workers to process work items within a certain amount of time, then combining the worker roles puts that at risk.  If the worker's resources are tied up working on one long running operation from one queue, it will be slower to pick up a message from another queue and process that.  You can improve this by scaling out, but even then, you may need to be careful to make sure that all of your worker roles aren't working on one type of message, while ignoring others.  Of course, if you don't care about the amount of time items spend in the queues, or if that's really flexible, then this might not matter to you.  Additionally, you can implement patterns of queue access that would limit the chance of this happening.  This includes being able to Update the queue message with a status indicator.  A worker doing this could decide that it will spend at most 30 seconds on one message, then save its current processing status into the queue message, and then go get a different message from a different queue.  The message they saved will become visible after its visibility timeout, and another worker will be able to process it beginning where the first one left off.

    2) However, this may add cost.  If you have 10 processes, each of which consumes very little resources, and very infrequently (say each of them processes 5 queue messages per day, each of which takes 5 seconds of CPU time), this would mean that the CPU ability of one box is likely capable of processing all of the messages quite quickly.  For uptime considerations, the usual recommendation is to run 2 instances of each role.  If you did a worker for each process, that might be 2 instances for each of 10 workers, meaning 20 total instances.  If you combine them, you could just run 2 instances.  You might choose to use a larger VM size to get more horsepower in the combined case if you need it.

    To summarize, it's really a balance, and you may end up deciding on some hybrid, where a single role that's expected to be updated more frequently or scaled more broadly is built as its own worker role, while the rest are combined.  Both solutions will work, it's really just a matter of what priorities you have.

    Glad you like queues!  Hopefully you also saw that we've recently lowered the price of storage transactions, so your queue transactions are now even less expensive than they were before!  See here for more information: http://blogs.msdn.com/b/windowsazurestorage/archive/2012/06/08/10x-price-reduction-for-windows-azure-storage-transactions.aspx


    -Jeff

    • Marked as answer by Orik007 Monday, June 18, 2012 7:59 PM
    Monday, June 18, 2012 4:44 PM
  • Thanks for answers. I think i will stick with one worker for each queue. I like it more and it easyer to handle (I store in message serialized objects, so I always know there is only one type). And so far I have only 3 queues in my application, maybe in future I can maerge more workers to one.

    Thanks.

    PS: I also like ur lowered price, because I make only for fun applications with no income :-)
    (im on 3 months trial right now)


    • Edited by Orik007 Monday, June 18, 2012 8:11 PM
    Monday, June 18, 2012 8:09 PM