none
ADF Continuous Integration - DataLake fails if self hosted integration selected

    Question

  • I have a development ADF.  It has a registered self hosted integration runtime.  It has a linked service to Azure DataLake Gen 1 (ADL) which uses Authentication type "Managed Service Identity".   All works ok after adding the ADF to the permissions on the ADL.

    Overall we are trying to set up a continuous integration pipeline (using the documented steps) and it looks like there are a few bumps along the way that haven't been resolved.

    If I create a blank 2nd ADF then use the RM exported template to reproduce a Production version (changing the name to the new ADF of course), then two issues unfold.

    The main problem is RM Template fails with an error that it could not encrypt the Data Lake credentials.  Interesting because there aren't any credentials in the Linked Service when you select MSI Authentication.

    The 2nd issue is that the deployment creates a self hosted Integration Runtime record in the ADF using the details from the source RM template.  It fails to connect to the IR obviously because the IR can only be linked to one ADF (the original) and then subsequently linked to others. 

    • Are there any instructions on how to set this up with a post deployment script? 
    • Even then, it is not ideal because you have a production IR that is "linked" to a shared development IR.  I suspect the better answer is to create a new IR post-deployment and run scripts to configure new ADF and configure IR to that host ADF using the server location and new keys.  At the moment, the CI page in ADF does not cover this scenario.

    Thanks

    Mark.

    UPDATE2:  Scratch update 1! - it is connected to Self hosted IRs.  The issue turns out that the deployment tries to encrypt the credentials with the active IR.  Because this is a deployment from a source template, and although the destination IR record is added as a connected service, there is no server currently configured or active an IR server and no link to a shared IR server.  So the credentials cannot be encrypted.  so it fails.  For me, the simple solution is for MSI, don't encrypt.  For SP, the product team will likely need to figure out how to resolve this chicken and egg with linked services and IR Runtime configuration.

    UPDATE:  To prove this is an MSI error in the first issue, I edited the template and added a service principal and encrypted credential to the DLake resource and left everything else the same.  It deploys successfully.  It turns out the self hosted IR is not relevant to this issue but it is relevant to the 2nd issue.

    i.e. This Service Principal version - succeeds

    {
        "name":"DLAKE01_LS",
        "type":"Microsoft.DataFactory/factories/linkedservices",
        "properties": {
            "description":"blah",
            "type": "AzureDataLakeStore",
            "typeProperties": {
               "dataLakeStoreUri":"https://d****lake01.azuredatalakestore.net/webhdfs/v1",
               "servicePrincipalId":"abc***def",
               "tenant":"********-****-****-****-************",
               "subscriptionId":"********-****-****-****-************",
               "resourceGroupName":"*********",
    "encryptedCredential":"ew0*********fQ=="
            }
        }

    }

    This MSI version fails to deploy

    {
        "name":"DLAKE01_LS",
        "type":"Microsoft.DataFactory/factories/linkedservices",
        "properties": {
    "description":"blah",       
            "type":"AzureDataLakeStore",       
            "typeProperties": {
            "dataLakeStoreUri":"https://d****lake01.azuredatalakestore.net/webhdfs/v1",     
            "tenant":"********-****-****-****-************",           
            "subscriptionId":"********-****-****-****-************",           
            "resourceGroupName":"*********"
            }
        }
    }




    Wednesday, November 7, 2018 12:02 AM

All replies

  • Hi, MarkAtAgilliance,
           When you try to use RM export template, the credential properties in original linked service will be removed and it needs your re-input. For the linked service with MSI, it can work in our PROD after having a try with following payload in deployment. If you still meet the issue, please create incidents to ADF team to ask for help.

    {
        "name":"DLAKE01_LS",
        "type":"Microsoft.DataFactory/factories/linkedservices",
        "properties": {
            "description":"blah",      
            "type":"AzureDataLakeStore",      
            "typeProperties": {
            "dataLakeStoreUri":"https://d****lake01.azuredatalakestore.net/webhdfs/v1",    
            "tenant":"********-****-****-****-************",          
            "subscriptionId":"********-****-****-****-************",          
            "resourceGroupName":"*********"
            }
        }
    }

    Thanks!


    Wednesday, November 7, 2018 8:01 AM
  • Hi, thanks for this.  If I just post a datalake as above, then yes it works.

    My mistake but I left out the crucial bit.  The connectvia as below.  This results in the MSI failing because the integration runtime is not set up when deployed by template.  It as added as an integration runtime but is not registered with an integration runtime server.

    {
        "name":"DLAKE01_LS",
        "type":"Microsoft.DataFactory/factories/linkedservices",
        "properties": {
            "description":"blah",      
            "type":"AzureDataLakeStore",      
            "typeProperties": {
            "dataLakeStoreUri":"https://d****lake01.azuredatalakestore.net/webhdfs/v1",    
            "tenant":"********-****-****-****-************",          
            "subscriptionId":"********-****-****-****-************",          
            "resourceGroupName":"*********"
            },

          "connectVia": {

             "referenceName": "integrationRuntime1",

             "type": "IntegrationRuntimeReference"

          }

        }
    }

    This is the deployment error

    2018-11-07T22:43:52.5909128Z ##[error]BadRequest: {
      "error": {
        "code": "BadRequest",
        "message": "Failed to encrypt sub-resource payload {
      "Id": "/subscriptions/********-****-****-****-************/resourceGroups/Datawarehouse/providers/Microsoft.DataFactory/factories/*************/linkedservices/DLAKE01_LS",
      "Name": "DLAKE01_LS",
      "Properties": {
        "description": "[Mark Davies] This links to the datalake store by name.  ",
        "annotations": [],
        "type": "AzureDataLakeStore",
        "typeProperties": {
          "dataLakeStoreUri": "********************",
          "tenant": "********************",
          "subscriptionId": "********************",
          "resourceGroupName": "********************"
        },
        "connectVia": {
          "referenceName": "integrationRuntime1",
          "type": "IntegrationRuntimeReference"
        }
      }
    } and error is: Failed to encrypted linked service credentials on self-hosted IR 'integrationRuntime1', reason is: NotFound, error message is: No online instance..",
        "target": "/subscriptions/********-****-****-****-************/resourceGroups/Datawarehouse/providers/Microsoft.DataFactory/factories/**************/linkedservices/DLAKE01_LS",
        "details": null
      }
    } undefined


    Could this also be related to this problem in the portal?  Something in the credential validation logic seems wrong in both cases.  When I change a service from Service Principal to MSI, it throws an error requiring a credential.  You can see the test connection succeweded.


    Wednesday, November 7, 2018 9:47 PM
  • Hi, guy,

    For the first question, if you reference one integration runtime in linked service by "connectVia", this integration runtime must be created and existed in our PROD metadata, otherwise the deployment will fail directly. If you reference one self-hosted IR, this IR must be registered and online firstly. This is by design because we want to safely protect the secrets nested in customer's linked service, based on the IR type, we can know how to encrypt them and where to store the linked service credentials.

    For the second question, if you choose to update linked service with changing auth type, by design, it will require you to re-input all linked service credentials. The error you see is one common tip, because you change auth type from service principle to MSI, no credentials need to be input, so just ignore the tip while test-connection is also succeed.

    Thanks!


    Friday, November 9, 2018 8:36 AM
  • Hi Zhangyi,

    Thanks so much for the response.  

    One problem I ran into in the first scenario when I was doing continuous integration deployment.  If I set up an integration runtime in the PROD environment, then the template fails to deploy because it detects a conflict between the Development integration runtime in the source resource template that it is trying to deploy and the Prod integration runtime that we have just created in Prod.  If I give is a different name, then the above failure is caused because of the reasons you gave.

    Looking at this another way, do you have a successful process by which you have deployed an Data Factory through continuous integration as described here in the documentation?   The scenario must have a self hosted integration and linked services that use MSI and not a service principal.

    I can successfully do this with Microsoft hosted integration runtime but have not been successful with a self hosted integration runtime using either an integration defined in the source nor adding pre-created target runtime with the same name.  I either get the encryption error or a conflict error.

    One question - in your recommendation, can the integration runtime in PROD be a linked integration runtime from a shared runtime in the development factory or does it have to be its own runtime (its own runtime agent/VM)?  The scenario I tried before had a linked runtime set up and this resulted in the deployment conflict.  I have not tried a completely independent runtime on the PROD box.

    Regards, Mark.



    Friday, November 9, 2018 10:34 PM
  • My latest work around is this:

    1. Change the datalake service on the source (dev) factory to use the auto integration runtime. (note that the other services can remain linked to the self hosted integration runtime)
    2. Export the RM template
    3. Create a new datafactory
    4. Deploy the saved template to the new data factory.  It will deploy successfully with both the auto integration runtime and the original self hosted IR but the self hosted IR will be in error state.
    5. Create a new integration runtime (or link to a shared one) but give it a different name to the one that is used by all other linked services.
    6. change the datalake linked service to use the  newly created integration runtime.
    7. change the other services to use the newly created integration runtime.
    8. delete the integration runtime that was imported with the template.

    Now I have a deployed UAT environment that used the development environment as a template and it has a configured integration runtime that is used by all the services.

    One thing I will try next is to reconfigure the errored self hosted IR in step 4 to see if I can get that re-connected as a linked IR or connect to a new IR (vs creating a new one in step 5) but I am not hopeful since the data services have a dependency on this.

    Mark.

    Tuesday, November 13, 2018 3:28 AM