Azure Service Bus Is not stable or reliable enough for a production system

Resposta Proposta Azure Service Bus Is not stable or reliable enough for a production system

  • Wednesday, January 11, 2012 1:37 PM
     
     

    Just this week on Monday morning from around 8AM to 10:20AM EST, the service bus was not working. I use the North Central data center. I opened a case with Microsoft. 

    The same is happening this morning. Can't access using Management Portal, all web services can't use Relayed or Brokered messaging. 

    I note the status shows green just like on Monday.

    I've been working with the Azure AppFabric ServiceBus actively since last October. Knowing what I know now, I see the platform is not yet ready for production.

    In addition to the multiple issues we've had just getting connected, this downtime that Microsoft has not acknowledged is unacceptable. It would be a lot better if there was admittance to the platform's inavailability. It's not 99.9% available.

All Replies

  • Wednesday, January 11, 2012 2:33 PM
     
     

    I'm not sure what time the service bus became unavailable this morning. I noticed only around 8AM EST. Now, around 9:32AM EST, it appears to once again be available.

     

  • Thursday, January 12, 2012 7:03 AM
    Moderator
     
     

    Hi,

    I am trying to involve someone familar Azure Cache Performance, it's will have a short delay, sorry for any inconvenient.

    Thank you
    Please mark the replies as answers if they help or unmark if not. If you have any feedback about my replies, please contact msdnmg@microsoft.com Microsoft One Code Framework
  • Thursday, January 12, 2012 6:33 PM
    Owner
     
     

    Bogatiy,

    If you've opened a case with us, please use that going forward to troubleshoot these issues. I'm sure the community at large would appreciate it if you'd loop back to this thread and post your conclusions as well.


    Trevor Hancock (Microsoft)
    Please remember to "Mark As Answer" the replies that help.
  • Friday, January 13, 2012 9:11 AM
     
     

    Hi bogatiy

    Continue posting these cases when Service Bus is not online. I'm considering SB for my next project and will like to know its uptime in general. I think it is very valuable for the community to know such things as the company is never reporting about it.

     

    Thx

  • Sunday, January 29, 2012 3:56 PM
     
     

    Well, here's my loopback.

    Microsoft did answer pretty well to the case I opened.

    The case involved two incidents or lack of availability during one week, and in particular on the dates 9 and 11 January.

    One case was attributed an 'overloaded front end to the service bus' and the other as '6% of nodes down'. I'm using the North Central US data center.

    As I read the SLA, which covers a 30 day period and offers 99.9% availability, the agreement was not met for January, since 4 hours of downtime were noted. I'm not sure of the remedy when SLA is not met. No money was lost so the damages were minimal at the time, only development time was lost.

    Also, I questioned why availability, rather the lack thereof, for the service bus (and Management Portal) was not visible on the Azure status page (http://www.windowsazure.com/en-us/support/service-dashboard/ ) and only got a response that the operations team is further studying. One of the outtages does show in the history for service bus on that page, but the other doesn't. And the one that shows in the history, was not visible during real-time, it only showed up later.

    So my overall conclusion is the service bus is not yet ripe for mission critical software/services to use. There are some monitoring limitations, which I've pointed out on other threads on this forum (and which Microsoft has said will be available in the near future), and lack of ability to monitor accurate current status.

    Please read the part above where I put 'mission critical'. Because most software isn't mission critical.

    My overall attitute leaves me quite attracted to the functionality and pricing of Azure AppFabric, so I will continue to consider the service bus for future software development. The connectivity and associated services, such as ACS, made the development for the particular project I'd been working on proceed much more quickly than if we'd chosen other technology avenues. Also, the architecture implemented was intended to offer easy extensibility, and this came true.

    Probably Microsoft understands Azure AppFabric is mission critical for them. And will further the stability and availability of the service bus to support mission critical software.

    Now I'm done with this thread ... and I'll not be monitoring availability for the that last project any further, though the software is actively using the service bus and will continue to do so at least for the near future.

  • Friday, February 03, 2012 12:02 AM
    Moderator
     
     

    Hi Bogatiy, I think that the Service Bus development team would be interested in investigating your impression that the Azure SB is not ready for mission critical software/services. This is a production service and as you noted yourself, the SLA is quite specific. I know that we have members of the team monitoring this forum but if you require a more detailed analysis, please respond to me and I'll try to escalate the issue. 

    Thanks,
    SethM
    Microsoft

  • Friday, February 03, 2012 8:50 PM
     
     Proposed Answer

    Hi Bogatiy,

    I'm one of the architects for Service Bus. Thank you for sharing your concerns and I’m sorry for the availability issues that we’ve experienced last month due to a very significant rise in uptake of the Service bus service, which we attribute to the new capabilities, the announced pricing model changes, and the fact that Service Bus is currently free of charge until the new pricing model kicks in. We’re taking the SLA as seriously as we do when billing is turned on.

    To handle the increased popularity of Service Bus for development and experimentation (which we see a lot of in great variety) and for production business use (you might yourself already have boarded a flight with your check-in having gone through Service Bus), we’ve meanwhile significantly increased capacity and we’re watching the system closely to see whether further capacity adjustments are needed. We’ve also discovered in the process that there were behaviors that we could and should improve when the system is at the upper edge of its capacity limits and we have meanwhile deployed a series of changes with the goal of achieving the desired behavior. Separately we’re working to close the status dashboard gap that you rightly point out.

    I’m happy to talk to you about Service Bus availability and “production readiness” in more concrete terms if you send me an email. My Microsoft.com mail alias is clemensv.  

    Best Regards
    Clemens Vasters