none
Security best practices Azure Data Lake creation

    Question

  • Hi,

    Am trying to come up with an ARM template for a data lake within the organization. Hopefully there would be just one data lake and then various folder structures beneath to represent the various business areas.

    For the time being, I am concentrating on creation of the data lake through an ARM template and later on appropriate folder structures would be made. I have gone through the best practices of Azure Data Lake which talks about creation of security groups as soon as you create the lake.

    1. Who should be the owner of the Data Lake? Given that it is enterprise wide , specific application IDs or app specific groups can't be the owner. I believe the owner is auto set to the AAD account which creates the data lake, in this case the account which will execute the ARM template. Should ideally there be other owners or roles? Let's take an example.

    If my structure is 

    /Finance/assets/2018/04/01/SomeFile.avro

    Then the service principal for the application writing those AVRO files should be granted Execute permissions on the root of the Data Lake and then read/write permissions on /Finance.

    I have tried this way for the Data Capture feature of the Event Hub and till I granted Execute permissions to Microsoft.EventHubs to the root, it kept giving me an access denied error.

    Doesn't that mean that every time, we come with an application that needs to write data, it must be granted Execute permissions on the root. And when it is granted Execute on the root, then it will also have the same execute permissions on the other subjects areas and if those subject areas are huge, then it could potentially take a really long time. Is that a correct observation?

    2. Should the security be then defined in the ARM template or be executed as a special add-on script post deployment like a PowerShell script which Microsoft talks about in the best practices document regarding ADL security? 

    Right now, I have an ARM template minus the security bit and thus looking for pointers as to how best to go about it.

    Thanks in advance.


    EDIT: I have gone through Melissa Coates's very detailed and nice blog here https://www.sqlchick.com/entries/2018/3/16/assigning-resource-management-permissions-for-azure-data-lake-store-part-2 

    and it does mention that "Typically, automated processes which do need access to the data (discussed in Part 3), don't need any access to the ADLS resource itself", then why is it that the capture for an event hub needs a Execute on the resource (ADL).

    Wednesday, February 6, 2019 11:29 AM

All replies

  • Hi Saugat,

    Please refer the following doc for security recommendations on Azure Data Lake Storage Gen2 :

    https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-best-practices#security-considerations

    Hope this helps.


    MSDN

    Wednesday, February 13, 2019 9:06 AM
    Moderator