locked
DataLake and Environments - Best Practice RRS feed

  • Question

  • All,

    Is it a best practice to have one Big DataLake for all the environments (Dev, Stage, QA and Prod)  or have a DataLake for Prod and another for Non-Prod ... etc.

    If we chose to share a datalake across environments then audit will play a major role in it. It would really help if others can share their experience and guidance.

    Thanks,

    rgn

    Wednesday, February 12, 2020 12:41 AM

Answers

  • Hello,

    You may checkout “FAQs about organizing a Data Lake”, which addressing your query.

    If I need a separate dev, test, prod environment, how would this usually be handled?

    Usually separate environments are handled with separate services. For instance, in Azure, that would be 3 separate Azure Data Lake Storage resources (which might be in the same subscription or different subscriptions).

    We wouldn’t usually separate out dev/test/prod with a folder structure in the same data lake. It can be done (just like you could use the same database with a different schema for dev/test/prod) but it’s not the typical recommended way of handling the separation. We prefer having the exact same folder structure across all 3 environments. If you must get by with it being within one data lake (one service), then the environment should be the top level node.

    Regarding monitoring in ADLS Gen2:

    Azure Data Lake Storage Gen2 provides metrics in the Azure portal under the Data Lake Storage Gen2 account and in Azure Monitor. Availability of Data Lake Storage Gen2 is displayed in the Azure portal. To get the most up-to-date availability of a Data Lake Storage Gen2 account, you must run your own synthetic tests to validate availability. Other metrics such as total storage utilization, read/write requests, and ingress/egress are available to be leveraged by monitoring applications and can also trigger alerts when thresholds (for example, Average latency or # of errors per minute) are exceeded.

    For more details, refer “Best practices for using Azure Data Lake Storage Gen2”.

    Hope this helps. Do let us know if you any further queries.

    ----------------------------------------------------------------------------------------

    Do click on "Mark as Answer" and Upvote on the post that helps you, this can be beneficial to other community members.

    • Marked as answer by grajee Thursday, February 13, 2020 3:42 PM
    Thursday, February 13, 2020 8:19 AM