none
Time/Time Zone Issues? Unknown Security Errors? Configure-BGPNAT Failing? How I Fixed The DST Deployment Bug RRS feed

  • Question

  • When I first started with TP2 when it was released, I did deployments and re-deployments multiple times with no issues whatsoever. I came to do an environment refresh a couple of days ago and just hit a wall.

    Consistent failures at step 22:

    "Invoke-EceAction : Task: Invocation of interface 'Configure' of role 'Cloud\Fabric\BGP' failed: 
    Function 'Configure-BGPNAT' in module 'Roles\BGP\SetupBGP.psm1' raised an exception:
    Processing data for a remote command failed with the following error message: An unknown security error 
    occurred."

    I also saw numerous errors relating to time syncing between MAS-DC01, MAS-BGPNAT01 and the physical Host.

    Erik posted in this thread about the same issue, which helped me get to a reliable fix for the problems. Thanks Erik!

    https://social.msdn.microsoft.com/Forums/azure/en-US/ec0d8486-7cb6-4655-a0b7-dbda0defded4/troubleshooting-azure-stack-tp2-installation-step-4041?forum=AzureStack

    I believe this is a time related issue with the TP2 images.

    • The TP2 package was released prior to PST DST changes, i.e. before Nov 6th.
    • I had no deployment issues, until I tried to deploy after Nov 6th.
    • When MAS-DC01 is created, it has the DST time of UTC-7h. Not the correct non-DST time of UTC-8h.
    • As time syncing is disabled for the other VMs that are deployed, and the physical host, MAS-DC01 becomes the authorative time source for the environment.
    • The physical host will fail a time sync as the required clock change is too large.
    • This time inconsistency causes a terminating error when it comes to the MAS-BGPNAT01 deployment.

    After some trial and error, and Erik's post, this is how I now deploy TP2. This has been problem free for the last 3/4 deployments.

    1. Prepare your CloudBuilder.vhdx and boot into it.
    2. Setup the networking and prerequisites as per docs.
    3. Ensure the time zone of the Host is in 'Pacific Standard Time'.
    4. Ensure the time is syncing from NTP source. Default time.windows.com works well.
    5. Start the Azure Stack deployment.
    6. Keep an eye on things, these steps require interaction at specific points which can go quite quickly.
    7. As soon as MAS-DC01 VM is created in Hyper-V, open the console.
    8. As soon as it is online and you have a command prompt, use the TIME command to correct the clock. You will need to move it forward one hour to match the current PST time.
    9. Close the console window, you can leave it logged in. Close any Hyper-V or other windows you have open. I have seen the scheduled reboot of the physical host fail because it times out waiting for other programs to close.
    10. The deployment will continue as normal, MAS-DC01 will reboot.
    11. Then the physical host will reboot to join the azurestack.local domain. This pauses the MAS-DC01 VM, which of course pauses the clock.
    12. Once the physical host and MAS-DC01 are back up, open the console to MAS-DC01 again.
    13. Again, use the TIME command to correct the clock to the current PST time. You will find the clock is 3-5 minutes out, depending on how long your physical host took to reboot.
    14. Again, close the console window and any other windows you have open.
    15. The deployment will now continue to run as normal, until it completes.
    16. The time sync steps between MAS-DC01, Host and MAS-BGPNAT01 will now complete as the clocks are all close enough to actual time to be corrected automatically.

    After a few hours, you should have Azure Stack successfully deployed. Unfortunately this takes the fully automated deployment off the table but at least things work again.

    This may help some of you with other issues, especially if you are seeing "security" related issues as incorrect time with kerberos can cause havoc. 

    Hopefully this can be properly fixed without having to wait for TP3.

    Thanks :)




    Thursday, November 17, 2016 7:40 PM

Answers

All replies

  • This is something we addressed in the TP2 Refresh build that was released today. Please download it and give it a try. With this refresh release we also released the updated App Services, SQL and MySQL RPs for TP2.

    Thanks,

    -Steve


    Steve Linehan | Principal Program Manager | Microsoft Enterprise Cloud Group

    Friday, November 18, 2016 5:02 AM
  • Thank you Chris, for your detailed workaround for the Time Sync issues we have started to see with the 0913.1 build.

    The good news is, this fix made it in our November refresh build, 1104.1, which is now publicly available.

    Since you are very familiar with this issue/experience, it would be fantastic if you could redeploy with this latest build, and make sure it works as expected (without any workarounds).

    Note - there are still areas where it may fail (not related to time sync issues), so if you experience this, please attempt:

    .\InstallAzureStackPOC.ps1 -Rerun

    We have improved this functionality, so it knows where the deployment failed, and continues from there.

    Of course, please keep sending your feedback, and fantastic community support!

    -Charles


    Charles Joy [MSFT] https://twitter.com/OrchestratorGuy

    Friday, November 18, 2016 5:04 AM
    Owner
  • Thanks Charles.

    I am downloaded the new build as I type this. Will do a deployment or two and see how I go.

    Will reply with some feedback.

    Thanks to everyone for all the work on Azure Stack. Loving it :)

    Friday, November 18, 2016 7:09 AM
  • Yep, all good now :)

    Have done a couple of deployments, no issues. All up and running.

    Thanks!

    Sunday, November 20, 2016 5:53 AM