How to best simulate VM failure in an ACS Kubernetes cluster RRS feed

  • Question

  • In preparation to deploy a Kubernetes ACS Production cluster, we are trying to simulate a situation where a VM would fail.

    So far we have tried a few approaches, which are not proven to be very realistic

    Approach 1:
    az acs scale --new-agent count<decrement-by-one>
    Seem to drain the VM of pods very gracefully, which does not seem a realistic VM failure

    Approach 2:
    the  az vm delete -g <resource-group> -n <agent-instance> --verbose --no-wait -y
    Kills the VM, but leaves disk resources and Public IP resources behind. Also the underlying AvailabilitySet does not seem to replace the lost VM

    Approach 3:
    We used the Stop button on the AvailabilitySet Agent VM on the console.
    This solution marks pods which have persistent volumes as 'Unknown'  but does not move them to another Agent VM

    Could you please advise on the recommended approach to simulate an Azure Container Service agent VM failure, in a manner whereby we can simulate an agent VM failing and a replacements being provisioned by the ACS Agent AvailabilitySet?

    • Changed type vikranth s Tuesday, February 13, 2018 6:12 PM Question
    Tuesday, February 13, 2018 5:31 PM

All replies

  • Support for Microsoft Azure Container Service has been moved to Stack Overflow. For any queries of Microsoft Azure Container Service, you may post here.

    Do click on "Mark as Answer" on the post that helps you, this can be beneficial to other community members.

    Tuesday, February 13, 2018 6:11 PM