none
Network Partions with Azure VMs

    Question

  • Hi,

    We are running a Rabbit MQ cluster across two Azure VMs and we seem to get intermittent breakdowns of the cluster (about every week or 2). Having discussed this on the Rabbit MQ forum and analysing the logs, it seems to be the result of a network partition.

    At present we do not run the two VMs in a virtual network and wondered if this might be a solution to the problem? Does anyone have any experience with Virtual Networks and could they advise if this will improve network stability between the two nodes?

    Thanks in advance.

    Ben



    • Edited by bdubzzz Monday, January 28, 2013 11:28 AM
    Monday, January 28, 2013 11:28 AM

All replies

  • Hi,

    The Windows Azure Storage Team has a couple of posts you should look at:

    Scalability targets for the Storage service

    How to get the most out of Tables.

    You typically improve performance by using more than one PartitionKey because the Storage service has lower scalability targets for a single partition than it does for a storage account with multiple partitions. The reason for my hedge was that performance obviously depends on what you are trying to do. However, in general, you improve performance by having more partitions.


    Monday, February 4, 2013 3:00 AM
    Moderator
  • Hi Tom,

    Thanks for your reply.

    The VMs are only being used as application servers so I'm not entirely sure the information you have provided is applicable. We are not having any issues with storage accounts, the issue is that the Rabbit MQ nodes which run on each VM are joined together in a cluster. Rabbit MQ stores its queue data locally on the server in the file system, not in a storage account.

    Every week or two, when I check the state of the cluster, i find that the bluster has broken and the information provided in the Rabbit MQ console and logs, is that the cluster broke down due to a "network partition". 

    According to Wikipedia(http://en.wikipedia.org/wiki/Network_partition), "A network partition refers to the failure of a network device that causes a network to partition. For example, in a network with multiple subnets where nodes A and B are located in one subnet and nodes C and D are in another. If the switch between the subnets fails, then the network is partitioned and nodes A and B can no longer communicate with nodes C and D"

    As mentioned before, we are not networking the VMs in any way at the moment, Rabbit MQ is simply clustered by communicating though a particular port. 

    Apologies if I've misunderstood your response, however I believe the issue is a networking issue and not a storage partition issue.

    Thanks,

    Ben


    Monday, February 4, 2013 10:54 AM
  • Ben - have you had any further response to this or success?

    We are experiencing the exact same issues with our VMs running RabbitMQ. At first I thought it could be due to the fact that we only had two nodes in the cluster (each node running on a different VM but within the same VLAN) so I added a third. It was fine for about a day before partitioning started ocurring again.

    Deems


    Friday, July 19, 2013 8:58 AM
  • Hi Deems,

    No, so far we are still in the same predicament. No-one has offered any advice so we are currently just having to monitor and rectify the network partitions as and when they occur. Sometimes this is a week apart, sometimes its months apart. 

    There are two options we are considering at the moment - 

    1) using the Federation or Shovel plugin which are designed for intermittent connectivity

    2) upgrading to the latest RabbitMQ (v3.1.3 at the time of posting) and trying out the new feature for auto-handling network partitions

    If you get a chance to try either of these solutions before I do it would be good to get your feedback on whether they rectified the issues.

    Ben

    Tuesday, July 23, 2013 8:42 AM