セールス: 1-800-867-1380

 none
Increasing role count takes site offline?

    質問

  • We are noticing that when increasing/decreasing instance counts (for both web and/or worker roles) via the management portal, the entire deployment becomes unavailable. The overall status is "Updating", and any browser requests made to the site during this time time out, usually displaying the "Site cannot be found" error.

    Is there another secret way to scale up/down without taking the site offline? Seems a bit odd that this happens...

    Thanks...

    2012年3月13日 1:00

回答

すべての返信

  • Is the behavior you see same regardless of how many instances you start with when scaling up? Increasing Role Count is generally an online operation. The Azure Portal might indicate "Updating" but generally that won't mean the any of the existing instances becomes offline. Decreasing instance count on the other hand is a slightly different operation. There will be some impact, but again nothing should go offline. Is there anything special that you are doing in the Role Start events?  Have you tried performing the operation via a Powershell or some other tool that will invoke the Service Management API? 

    Ranjith

    http://www.opstera.com

    2012年3月13日 3:43
  • Hi SagerCat,

    Are you upgrading from 1 instance to 2 instances ?


    Be nice to nerds ... Chances are you'll end up working for one!

    2012年3月13日 6:44
  • Nothing special going on in the Role Start events...thanks for the tip - I did force me to go have a look to be sure, but nothing special at all.

    It does seem like most of the outages happen while moving from 1 to 2 instances...is this a known problem? If so, I hope you will agree that makes _no_ sense...the scaling operation is being applied in response to load, and shouldn't affect a running deployment regardless of the node count. Does the SLA specifically call out that scaling is also impacted by having 1 vs. 2 (or more)?

    Thanks for the info!

    2012年3月13日 11:51
  • It has to do with the upgrade domains that define how Windows Azure instances are being upgraded. When you change your instance count from 1 to 2, you basically redeploy a new configuration file, which will result in a recycle of the instance. The configuration file can contain multiple changes, so it has to be reloaded to make sure all changes to the configuration file are picked up.

    Since you only have 1 instance, it will go down while being upgraded. That's why the SLA is based upon 2 instances. If you are using 2 instances, the upgrade by upgrade domain will provide you with a running instance while it upgrades the other instance and the windows azure load balancer will make sure all requests end up at the available instance during upgrade. That way you keep your application available while it's being upgraded.

    The suggestion most people will make is to have 2 running instances of your deployment, which providers you with the availability.


    Be nice to nerds ... Chances are you'll end up working for one!


    2012年3月13日 12:08
  • As others have mentioned it, your configuration is changed and by default web servers will be rebooted.  Reboots are done with fault-tolerant logic if you have enough instances (2+)

    You can however disable the reboot.  check this thread here for instructions: http://www.paraleap.com/qa/?qa=97/instances-dynamically-without-affecting-running-instances


    Auto-scaling & monitoring service for Windows Azure applications at http://www.paraleap.com

    2012年3月13日 15:31
  • I understand the technical reason that this happens, but this (IMO) is a poor design decision by Azure. Because they offer no discrete instance increase/decrease, your deployment will suffer a rolling upgrade through all upgrade domains in order to increase capacity. So, if your application is under heavy load at 4 instances, in order to increase to 8, you must endure a period of _reduced_ capacity while the upgrades are complete - this really makes sense to everybody?

    Since elasticity is a key component of cloud-based architectures, I would have though that the elasticity-oriented actions (instance +/-, VM sizing) would have been isolated from other deployment-based activities.

    Very interesting...thanks for the info.

    2012年3月13日 20:09
  • Hi SagerCat,

    I'm a Windows Azure fan, but I agree with you on this topic.

    I believe it might have been useful to have a possibility to upgrade the role instance count without having the cycle through the upgrade domains for each existing role. If it would have been a simple instance increase, the current instances would not have to be affected and there would not be a temporary capacity decrease while cycling through the instance upgrades.

    There might be things we are not thinking off, but hopefully it's one of the features that might appear in future.



    Be nice to nerds ... Chances are you'll end up working for one!

    2012年3月14日 5:47
  • Hi All,

    I have to report a similar problem when changing instance count in a web role from 1 to 2.

    When I do this instance count change I am noticing no response from my service, starting about 30 secs after the change and until the second instance comes online (about 4-5 mins later). After that, although I have 2 instances online (I can check this through logging) only the 2nd one is answering - like the 1st instance is taken off the load balancer. I've checked that the 2 instances still communicate through internal endpoints, but the 1st is not receiving external requests.

    After that, when I change back to 1 instance, after a while I see both instances answering (for about 30 secs) - the 1st is activated in load balancer again, and then normally the 2nd instance goes offline and the 1st is online again. 

    If I want 2 instances activated after this change I have to reboot the 1st instance through the management portal to see them both answering. Everything is ok then.

    This is a really strange behaviour and I have an offline period about 5-10 minutes and after that only one working instance - although I am charged for 2 and the azure portal reports 2 active instances with status 'ready'. I've noticed this happening last week, I believe it is a bug of the system and I can say that this was not happenning some time before. I use this schedule - changing from 1 to 2 instances - 2-3 months with no problems, but last period I am experiencing this problem. 

    Thank you for your attention.



    2012年3月15日 0:27
  • Hello Sagar,

    Sagar I can suggest you somethng to avoid downtime in Azure, you can try VIP swap to avoid downtime

    Hope this article will help you -

    http://msdn.microsoft.com/en-us/library/windowsazure/ee517253.aspx


    If you found this post useful, Please "Mark as Answer" or "Vote as Helpful". Best Regards

    2012年3月19日 21:39
  • See these similar threads:

    http://social.msdn.microsoft.com/Forums/en-US/windowsazuretroubleshooting/thread/3fcadf60-f14b-41a3-adf7-b24bc79a3f51

    http://social.msdn.microsoft.com/Forums/nl-NL/windowsazuretroubleshooting/thread/29f1fbe8-6be5-4f3a-8bbe-ff1c2041aed8

    In the first thread you'll see an answer to the issue when going from 1 instance to 2 instances, that only 1 of the 2 instances responds to the requests.


    Be nice to nerds ... Chances are you'll end up working for one!

    2012年3月21日 7:05