Service Fabric - FabricTransientException while removing application package from image store RRS feed

  • Question

  • Service Fabric version: 6.4.644.9590 (secured)
    Build Agent version: 2.150.3
    ARM Managed clusters
    Azure hosted

    Using Azure DevOps release pipeline we are encountering the following error intermittently while releasing using Service Fabric Application Deployment task

    Removing application package from image store...
    Exception occurred: System.Fabric.FabricTransientException
    Retrying to remove application package..
    Exception occurred: System.Fabric.FabricTransientException
    Retrying to remove application package..
    Exception occurred: System.Fabric.FabricTransientException
    ##[warning]The certificate with thumbprint CD9F374F03B092BXXXXXXXXX is not present in the local certificate store. This can potentially cause errors. If the release/build fails, please re-try it or ensure that multiple agents are not running builds/releases using the same service endpoint simultaneously on the same machine.
    ##[error]Could not ping any of the provided Service Fabric gateway endpoints.

    The pipeline fails with the [error] above and the application remains unavailable until we can successfully execute the release, usually a second manual re-running of the release pipeline.

    This is happening on multiple service fabric instances and multiple nodes, multiple applications on different ports. it does appear that it occurs when there is more than one application being deployed simultaneously to the same nodeset, so when we release manually after the initial failure, it is one at a time and succeeds.

    • Edited by mr0271 Monday, May 13, 2019 8:33 PM
    Friday, May 10, 2019 10:50 PM

All replies

  • What have you done so far to try an isolate the issue? Seems like there are a lot of areas where we could see issues so might be hard to point you in one direction. 

    Saturday, May 11, 2019 1:08 AM
  • Although it occurs on multiple nodesets, for testing I've isolated it to a certain nodeset with three applications running. Executing the release pipelines for two applications within approximately 10 seconds of each other one of them typically fails.

    Thinking it may be related to image store, I connected to the image store via powershell (Get-ServiceFabricImageStoreContent) and watched as the packages were loaded for the two apps. When one of the two deploy tasks fails, its application package remains on the image store.

    During one test, I manually removed the package (Remove-ServiceFabricApplicationPackage) while the task was attempting to remove it. The deploy task still failed. It leads me to believe that the second task actually cannot connect to the cluster.

    Is it possible that the service fabric client certificate gets removed from the build VM as part of the deployment and that is why the second can no longer connect?  Both agents are on the same VM.

    Monday, May 13, 2019 6:11 PM
  • My thought above is correct.  Our VSTS service connections are setup to connect to the cluster using certificates. The Deploy Service Fabric Application phase connects to the SF, downloads the client certificate and stores it in the current user cert store and performs the package deploy. It then deletes the client certificate before moving on to the next phase in the release. Since both agents are on the same VM, and share a single cert store, the second release can no longer connect after the client cert was deleted by the first.

    I tried adding the client cert to the local machine cert store but it doesn't appear that the scripts will recognize it.

    Looks like we need to setup AAD for these service connections.

    • Edited by mr0271 Monday, May 13, 2019 8:26 PM
    Monday, May 13, 2019 8:24 PM
  • I think it would be best to get in touch with a Support engineer who can take a look at the backend logs during the creation to see what the issue might be. Do you have the ability to open a technical support ticket? If not, you can email me at AzCommunity@microsoft.com and provide me with your Azure SubscriptionID and link to this thread. I can then enable your subscription for a free support request to get this sorted out. 

    Monday, May 13, 2019 9:26 PM
  • Any update on this issue? 

    Friday, May 17, 2019 8:18 PM
  • Any update on this thread? 
    Friday, May 31, 2019 7:11 PM