none
Azure Linux VM Backup Fails - Unable to stop/restart or update walinuxagent RRS feed

  • Question

  • Hi,

    UBUNTU 16.04
    We've a linux azure VM wherein the backup failed "Could not communicate with the VM agent for snapshot status - Snapshot VM sub task timed out". Manual backup failed with the same error.

    We checked - Agent (Ready state) & 2.2.32 version. We tried restarting the agent but it fails with connection timed out as an error. The same happened while trying to update the agent as it was not able to stop the running walinuxagent.service & the process got stuck so we abandoned it.

    systemctl restart walinuxagent.service - doesn't work
    systemctl stop walinuxagent.service - doesn't work

    Please let us know how to resolve this issue & initiate the backup again 
    Following the \var\logs\waagent.log 

    2019/07/01 06:25:04.525782 ERROR ExtHandler Event: name=Microsoft.Azure.RecoveryServices.SiteRecovery.Linux, op=Download, message=[ExtensionError] Non-zero exit code: 1, A2aVmExtension/handler.py disable
    [stdout]
    2019/07/01 06:25:02.516906 INFO ExtHandler Move process 28936 into cgroup for extension Microsoft.Azure.RecoveryServices.SiteRecovery.Linux


    [stderr]
    Traceback (most recent call last):
      File "/var/lib/waagent/Microsoft.Azure.RecoveryServices.SiteRecovery.Linux-1.0.0.9117/A2aVmExtension/handler.py", line 32, in <module>
        from A2aVmExtension.CmdHandlers.install_and_configure \
      File "/var/lib/waagent/Microsoft.Azure.RecoveryServices.SiteRecovery.Linux-1.0.0.9117/A2aVmExtension/CmdHandlers/install_and_configure.py", line 39, in <module>
        from main.ExtensionLib.utilities import get_cmd_output
    ImportError: No module named main.ExtensionLib.utilities
    , duration=0
    2019/07/01 06:25:04.539348 INFO [Microsoft.Azure.RecoveryServices.SiteRecovery.LinuxUBUNTU1604-1.0.0.9118] Target handler state: enabled
    2019/07/01 06:25:04.540780 INFO [Microsoft.Azure.RecoveryServices.SiteRecovery.LinuxUBUNTU1604-1.0.0.9118] [Enable] current handler state is: notinstalled
    2019/07/01 06:25:04.541169 INFO [Microsoft.Azure.RecoveryServices.SiteRecovery.LinuxUBUNTU1604-1.0.0.9118] Using existing extension package: /var/lib/waagent/Microsoft.Azure.RecoveryServices.SiteRecovery__LinuxUBUNTU1604__1.0.0.9118.zip
    2019/07/01 06:25:04.541328 INFO [Microsoft.Azure.RecoveryServices.SiteRecovery.LinuxUBUNTU1604-1.0.0.9118] Unzipping extension package: /var/lib/waagent/Microsoft.Azure.RecoveryServices.SiteRecovery__LinuxUBUNTU1604__1.0.0.9118.zip
    2019/07/01 06:25:04.793542 INFO [Microsoft.Azure.RecoveryServices.SiteRecovery.LinuxUBUNTU1604-1.0.0.9118] Initialize extension directory
    2019/07/01 06:25:04.795868 INFO [Microsoft.Azure.RecoveryServices.SiteRecovery.LinuxUBUNTU1604-1.0.0.9118] Update settings file: 0.settings
    2019/07/01 06:25:04.797549 INFO [Microsoft.Azure.RecoveryServices.SiteRecovery.LinuxUBUNTU1604-1.0.0.9117] Disable extension [A2aVmExtension/handler.py disable]
    2019/07/01 06:25:04.801784 INFO ExtHandler Move process 29005 into cgroup for extension Microsoft.Azure.RecoveryServices.SiteRecovery.LinuxUBUNTU1604
    2019/07/01 06:25:06.809827 ERROR ExtHandler Event: name=Microsoft.Azure.RecoveryServices.SiteRecovery.LinuxUBUNTU1604, op=Download, message=[ExtensionError] Non-zero exit code: 1, A2aVmExtension/handler.py disable
    [stdout]
    2019/07/01 06:25:04.801784 INFO ExtHandler Move process 29005 into cgroup for extension Microsoft.Azure.RecoveryServices.SiteRecovery.LinuxUBUNTU1604


    [stderr]
    Traceback (most recent call last):
      File "/var/lib/waagent/Microsoft.Azure.RecoveryServices.SiteRecovery.LinuxUBUNTU1604-1.0.0.9117/A2aVmExtension/handler.py", line 32, in <module>
        from A2aVmExtension.CmdHandlers.install_and_configure \
      File "/var/lib/waagent/Microsoft.Azure.RecoveryServices.SiteRecovery.LinuxUBUNTU1604-1.0.0.9117/A2aVmExtension/CmdHandlers/install_and_configure.py", line 39, in <module>
        from main.ExtensionLib.utilities import get_cmd_output
    ImportError: No module named main.ExtensionLib.utilities
    , duration=0
    2019/07/01 06:25:06.896216 INFO [Microsoft.Azure.RecoveryServices.SiteRecovery.LinuxUBUNTU1604-1.0.0.9118] Remove extension handler directory: /var/lib/waagent/Microsoft.Azure.RecoveryServices.SiteRecovery.LinuxUBUNTU1604-1.0.0.9118
    2019/07/01 06:25:06.916748 INFO [Microsoft.Azure.RecoveryServices.SiteRecovery.Linux-1.0.0.9118] Remove extension handler directory: /var/lib/waagent/Microsoft.Azure.RecoveryServices.SiteRecovery.Linux-1.0.0.9118
    2019/07/01 06:25:09.939578 INFO [Microsoft.Azure.RecoveryServices.SiteRecovery.Linux-1.0.0.9118] Target handler state: enabled
    2019/07/01 06:25:09.940387 INFO [Microsoft.Azure.RecoveryServices.SiteRecovery.Linux-1.0.0.9118] [Enable] current handler state is: notinstalled
    "waagent.log" [readonly] 16542553L, 2311268155C                                                                                                       1,1           Top

    Regards,
    Gaurav N.

    Wednesday, July 31, 2019 9:01 AM

All replies

  • ………………………………………………………………….
    Following’s the output for  \var\log\syslog
    ………………………………………………………………….<o:p></o:p>

    Jul 31 06:26:44 PMS-PROD CRON[18928]: (CRON) info (No MTA installed, discarding output)<o:p></o:p>

    Jul 31 06:26:50 PMS-PROD python3[8425]: 2019/07/31 06:26:50.309213 ERROR ExtHandler Command: [systemctl daemon-reload], return code: [1], result: [Failed to execute operation: Connection timed out<o:p></o:p>

    Jul 31 06:26:50 PMS-PROD python3[8425]: ]<o:p></o:p>

    Jul 31 06:27:01 PMS-PROD CRON[19256]: (root) CMD ([ -f /etc/krb5.keytab ] && [ \( ! -f /etc/opt/omi/creds/omi.keytab \) -o \( /etc/krb5.keytab -nt /etc/opt/omi/creds/omi.keytab \) ] && /opt/omi/bin/support/ktstrip /etc/krb5.keytab /etc/opt/omi/creds/omi.keytab >/dev/null 2>&1 || true)<o:p></o:p>

    Jul 31 06:27:15 PMS-PROD python3[8425]: 2019/07/31 06:27:15.344755 ERROR ExtHandler Command: [systemctl start system-walinuxagent.extensions-Microsoft.Azure.RecoveryServices.SiteRecovery.Linux_1.0.0.9119.slice], return code: [1], result: [Failed to start system-walinuxagent.extensions-Microsoft.Azure.RecoveryServices.SiteRecovery.Linux_1.0.0.9119.slice: Connection timed out<o:p></o:p>

    Jul 31 06:27:15 PMS-PROD python3[8425]: See system logs and 'systemctl status system-walinuxagent.extensions-Microsoft.Azure.RecoveryServices.SiteRecovery.Linux_1.0.0.9119.slice' for details.<o:p></o:p>

    Jul 31 06:27:15 PMS-PROD python3[8425]: ]<o:p></o:p>

    Jul 31 06:27:15 PMS-PROD python3[8425]: 2019/07/31 06:27:15.345293 INFO ExtHandler Created slice for system-walinuxagent.extensions-Microsoft.Azure.RecoveryServices.SiteRecovery.Linux_1.0.0.9119.slice<o:p></o:p>

    Jul 31 06:27:15 PMS-PROD python3[8425]: 2019/07/31 06:27:15.347368 INFO [Microsoft.Azure.RecoveryServices.SiteRecovery.Linux-1.0.0.9119] Update settings file: 0.settings<o:p></o:p>

    Jul 31 06:27:15 PMS-PROD python3[8425]: 2019/07/31 06:27:15.348448 INFO [Microsoft.Azure.RecoveryServices.SiteRecovery.Linux-1.0.0.9117] Disable extension [A2aVmExtension/handler.py disable]<o:p></o:p>

    Jul 31 06:27:16 PMS-PROD python3[8425]: 2019/07/31 06:27:16.352639 INFO ExtHandler Started extension using scope 'Microsoft.Azure.RecoveryServices.SiteRecovery.Linux_1.0.0.9117_93a2b419-de66-4f57-9fe4-7cfe5a1487ba'<o:p></o:p>

    Jul 31 06:27:16 PMS-PROD python3[8425]: 2019/07/31 06:27:16.353762 INFO ExtHandler Started tracking new cgroup: Microsoft.Azure.RecoveryServices.SiteRecovery.Linux-1.0.0.9117, path: /sys/fs/cgroup/cpu/system.slice/Microsoft.Azure.RecoveryServices.SiteRecovery.Linux_1.0.0.9117_93a2b419-de66-4f57-9fe4-7cfe5a1487ba.scope<o:p></o:p>

    Jul 31 06:27:16 PMS-PROD python3[8425]: 2019/07/31 06:27:16.355148 INFO ExtHandler Started tracking new cgroup: Microsoft.Azure.RecoveryServices.SiteRecovery.Linux-1.0.0.9117, path: /sys/fs/cgroup/memory/system.slice/Microsoft.Azure.RecoveryServices.SiteRecovery.Linux_1.0.0.9117_93a2b419-de66-4f57-9fe4-7cfe5a1487ba.scope<o:p></o:p>

    Jul 31 06:27:41 PMS-PROD python3[8425]: 2019/07/31 06:27:41.383377 ERROR ExtHandler Event: name=Microsoft.Azure.RecoveryServices.SiteRecovery.Linux, op=Download, message=[ExtensionError] Non-zero exit code: 1, A2aVmExtension/handler.py disable<o:p></o:p>

    Jul 31 06:27:41 PMS-PROD python3[8425]: [stdout]<o:p></o:p>

    Jul 31 06:27:41 PMS-PROD python3[8425]: [stderr]<o:p></o:p>

    Jul 31 06:27:41 PMS-PROD python3[8425]: Failed to start transient scope unit: Connection timed out<o:p></o:p>

    Jul 31 06:27:41 PMS-PROD python3[8425]: , duration=0<o:p></o:p>

    Jul 31 06:27:41 PMS-PROD python3[8425]: 2019/07/31 06:27:41.417250 INFO [Microsoft.Azure.RecoveryServices.SiteRecovery.LinuxUBUNTU1604-1.0.0.9119] Target handler state: enabled<o:p></o:p>

    Jul 31 06:27:41 PMS-PROD python3[8425]: 2019/07/31 06:27:41.425732 INFO [Microsoft.Azure.RecoveryServices.SiteRecovery.LinuxUBUNTU1604-1.0.0.9119] [Enable] current handler state is: notinstalled<o:p></o:p>

    Jul 31 06:27:41 PMS-PROD python3[8425]: 2019/07/31 06:27:41.434768 INFO [Microsoft.Azure.RecoveryServices.SiteRecovery.LinuxUBUNTU1604-1.0.0.9119] Using existing extension package: /var/lib/waagent/Microsoft.Azure.RecoveryServices.SiteRecovery__LinuxUBUNTU1604__1.0.0.9119.zip<o:p></o:p>

    Jul 31 06:27:41 PMS-PROD python3[8425]: 2019/07/31 06:27:41.446586 INFO [Microsoft.Azure.RecoveryServices.SiteRecovery.LinuxUBUNTU1604-1.0.0.9119] Unzipping extension package: /var/lib/waagent/Microsoft.Azure.RecoveryServices.SiteRecovery__LinuxUBUNTU1604__1.0.0.9119.zip<o:p></o:p>

    Jul 31 06:27:41 PMS-PROD python3[8425]: 2019/07/31 06:27:41.716480 INFO [Microsoft.Azure.RecoveryServices.SiteRecovery.LinuxUBUNTU1604-1.0.0.9119] Initializing extension Microsoft.Azure.RecoveryServices.SiteRecovery.LinuxUBUNTU1604-1.0.0.9119<o:p></o:p>

    Jul 31 06:28:01 PMS-PROD CRON[19346]: (root) CMD ([ -f /etc/krb5.keytab ] && [ \( ! -f /etc/opt/omi/creds/omi.keytab \) -o \( /etc/krb5.keytab -nt /etc/opt/omi/creds/omi.keytab \) ] && /opt/omi/bin/support/ktstrip /etc/krb5.keytab /etc/opt/omi/creds/omi.keytab >/dev/null 2>&1 || true)<o:p></o:p>

    Jul 31 06:28:06 PMS-PROD python3[8425]: 2019/07/31 06:28:06.749254 ERROR ExtHandler Command: [systemctl daemon-reload], return code: [1], result: [Failed to execute operation: Connection timed out<o:p></o:p>

    Jul 31 06:28:06 PMS-PROD python3[8425]: ]<o:p></o:p>

    "/var/log/syslog" [readonly] 2507L, 498528C                                                      <o:p></o:p>

    Wednesday, July 31, 2019 10:02 AM
  • Did you try reinstalling the WAlinuxagent?
    Wednesday, July 31, 2019 11:28 AM
    Moderator
  • We tried updating the agent version to the latest 2.2.32 -to- 2.2.40. The same failed after package download as it was not able to stop the current running agent service.
    Wednesday, July 31, 2019 11:31 AM
  • This issue warrants further investigation by Linux VM team and that would require 1:1 conversation. This can be better be taken up at the technical support platform by creating a support request. Please let me know if you've any limitations creating support ticket.
    Wednesday, July 31, 2019 12:00 PM
    Moderator