locked
Data Governance Solution approach for Data Lakes RRS feed

  • Question

  • Hi All,

    I am evaluating how to implement a Data Governance solution with Azure Data Catalogue for a Data Lake batch transformation pipeline. Below is my approach to it. Any insights please?

    • Data Factory can't capture the lineage from source to Data Lake. So, I am skipping this.
    • I know Data Catalogue can't not maintain business rules for data curation on the Data Lake.
    • First the data feed is onboard manually from Azure Data Catalogue under a given business glossary, etc. Or When raw data feed is ingested into Data Lake Storage, the asset to be created automatically under a given business glossary (if it does not exists).
    • The raw data is cleaned, classified and tagged during a light transformation on the lake. Thus, related tags needs to be created on Data Catalogue. (this is custom coding calling Azure Data Catalogue REST API's)
    • Then, there is ETL processing. New data assets to be created with tagging in Data Catalogue. The tools are Spark based. (this is custom coding calling Azure Data Catalogue REST API's)
    • Finally, Data Catalogue will display all data assets created in Data Lake batch transformation data pipeline under specific business glossary with the right tags.
    • I am skipping Operational meta-data and full lineage as there is no such solution in Azure offerings. this needs to be custom solution again.

    I am looking for the best practice. Appreciate your thoughts.

    Many thanks

    Cengiz

    Monday, May 4, 2020 1:34 PM

All replies

  • Hi CengizK71,

    Thanks for your query. We have reached out to internal team to get more info about your query. I will update here once we have a response from the internal team.

    Thank you

    If a post helps to resolve your issue, please click the "Mark as Answer" of that post and/or click Answered "Vote as helpful" button of that post. By marking a post as Answered and/or Helpful, you help others find the answer faster.

    Monday, May 4, 2020 7:08 PM
  • Hi CengizK71,

    Could you please tell if your query is related to ADLS Gen1 or ADLS Gen2? 

    Thank you

    If a post helps to resolve your issue, please click the "Mark as Answer" of that post and/or click Answered "Vote as helpful" button of that post. By marking a post as Answered and/or Helpful, you help others find the answer faster.

    Wednesday, May 6, 2020 12:52 AM
  • Hi,

    Lets say Gen 2. If that makes a big difference, we might also consider Gen 1.

    Many thanks.

    Cengiz

    Wednesday, May 6, 2020 9:33 AM
  • Thanks much for your response. The reason I wanted to confirm on this is because ADLS Gen2 is not a supported data source for Azure Data Catalog, only ADLS Gen1 is supported. 

    Thank you

    If a post helps to resolve your issue, please click the "Mark as Answer" of that post and/or click Answered "Vote as helpful" button of that post. By marking a post as Answered and/or Helpful, you help others find the answer faster.

    Wednesday, May 6, 2020 9:23 PM