locked
When to use Service Principals vs RBAC? RRS feed

  • Question

  • So for authentication, both RBAC can be leveraged for roles using service accounts, but service principals can also be used for the following services. Would love any feedback as to which option might be better for the following services in the way that they connect to one another? What is the best practice for each resource in terms of setting up authentication, out of those two options?

    Databricks

    Data Factory

    SQL Data Warehouse

    Event Hubs

    Azure Data Lake Store Gen 2

    Analysis Services

    Wednesday, March 25, 2020 6:16 PM

All replies

  • So for authentication, both RBAC can be leveraged for roles using service accounts, but service principals can also be used for the following services. Would love any feedback as to which option might be better for the following services in the way that they connect to one another? What is the best practice for each resource in terms of setting up authentication, out of those two options?

    Databricks

    Data Factory

    SQL Data Warehouse

    Event Hubs

    Azure Data Lake Store Gen 2

    Analysis Services

    Wednesday, March 25, 2020 6:32 PM
  • Hello anonymous78617861 and thank you for your question.  This is actually a large subject, with nuances that are better answered by including Managed Identity in the discussion.  In some cases, the lines between service principal and service account can blur.  Some might say that service principals are service accounts for the cloud.

    Azure has a notion of a Service Principal which, in simple terms, is a service account. On Windows and Linux, this is equivalent to a service account
    There is enough to be said, that I would like to break this discussion into parts.  I will share details focused on one of the services now.  Please tell me if this is the sort of information you are looking for, or reply with an example of using a service account instead of service principle.  Following that I will respond with details on the other services.

    Azure Data Lake Store Gen 2:

    There are a number of ways to authenticate to ADLS gen2, and they affect the granularity of access control available.  Account key gives access to everything, and without a way to readily identify who made the change.

    Service principle can be used with, or without RBAC, and this makes a significant difference.  RBAC can be set for account or container level, but this will override Access Control Lists intended for more granular control.  If you want to use ACL's, do not assign RBAC permissions.  Use the service principal ID in the ACL's as if it were a service account.

    If you want to Data Factory to access ADLS gen2 with a firewall, you must either, A) use Managed Identity and enable Trusted Services, or, B) whitelist the IP / CIDR ranges for the Data Factory service provider in your region.

    Databricks has an additional method to access ADLS gen2 called Credential Passthrough.  This is only available with the Azure Databricks premium plan.  Without that, standard practice is to configure the cluster to authenticate with service principle, storing the details in a secret scope.

    Thursday, March 26, 2020 5:48 AM
  • This sort of response is great for ADLS Gen 2. Would definitely love to hear this sort of response for the other services as well. 

    Thanks for the response!

    Thursday, March 26, 2020 3:47 PM
  • Thank you for the feedback.

    Azure SQL Data Warehouse:

    There are multiple ways to connect to the SQL server.  Service principals/accounts can even be added directly by a statement like CREATE USER [Fritz@contoso.com] FROM EXTERNAL PROVIDER;  (For Active Directory user) CREATE USER [MyServicePrincipleName] FROM EXTERNAL PROVIDER;  (For app registration or security group).

    That said, Data Factory will connect using SQL Authentication, Service Principal, or Managed Identity.  Managed Identity is nice because there is no remembering passwords involved.  Just to go the server RBAC and add the data factory.

    If you want your SQL server to connect directly to storage like blob, you must turn on "Allow Trusted Microsoft Services" on the SQL server.  You can use Account key or SAS Token for this.  A SAS Token allows you to specify the scope of access (read/write and which folder) as well as the duration, making it safer to use than account key.

    Databricks can connect to Azure Synapse / Azure SQL Data Warehouse.  There is an storage account used for staging the data between them.  Please read more here.  Both JDBC connection string, and storage account authentication are involved.

    Friday, March 27, 2020 10:18 PM
  • I have a correction to make.  The SSIS Azure Storage Connection Manager apparently CAN use service principal.

    Azure DataBricks:

    Databricks connects to Event Hubs using a connection string and SAS key.  Please take a look at this tutorial.

    Data Factory has several connectors for Azure Databricks.  You can use them to run Notebooks, Jar files, or Python files.  While it is possible to return a value from Databricks to Data Factory, the mechanism is not suited to returning significant amounts of data.  Data Factory's role with Databricks is more of coordination and orchestration.  Databricks should write the output itself, not try to pass back to the factory for writing.

    I already talked about Databricks with Data Lake Store Gen2 and with Data Warehouse.

    Event Hubs:

    Event hubs has an optional 'Capture' mechanism which saves to storage, including Data Lake Store Gen2.  After capture, the data can be pulled by Data Factory or other services which do not natively support streaming.

    Azure Analysis Services:

    Analysis Services integrates with Azure Active Directory, and has a full article on service principles I recommend you look at.

    List of data sources supported by Analysis Services.

    For connecting to Data Lake Store Gen2, I recommend caution.  While the above document says you can use the Azure Blob Storage connector, there is the possibility of unforeseen difficulties.  The two storage protocols are not interchangeable.

    For connecting to Data Warehouse:

    When specified as a provider data source in tabular 1200 and higher models, both in-memory and DirectQuery models require Microsoft OLE DB Driver for SQL Server MSOLEDBSQL (recommended), SQL Server Native Client 11.0, or .NET Framework Data Provider for SQL Server.

    Monday, March 30, 2020 9:43 PM
  • Just checking to see if the above answer was helpful. If this answers your query, please do consider to click “Mark as Answer” and "Up-Vote" as it might be beneficial to other community members reading this thread. And, if you have any further query do let us know.
    Thursday, April 2, 2020 11:09 PM