none
Data Lineage in Data Lake Gen 2 RRS feed

  • Question

  • Hi,

    We are using Azure Data Lake Gen 2. And as you would find in most best practices , it is mentioned that tracking Data lineage for data within the Data Lake is extremely important.

    However, there is no documentation around this. We can certainly add some "metadata" in the form of Key Value pairs when uploading the file, but how about proper lineage tracking? 

    Any documentation or link surrounding this? 


    Monday, July 29, 2019 10:23 AM

All replies

  • Hi SaugatMukherjee,

    Could you please elaborate more on  "proper lineage tracking"? Do you have an example?


    [If a post helps to resolve your issue, please click the "Mark as Answer" of that post and/or click Answered "Vote as helpful" button of that post. By marking a post as Answered and/or Helpful, you help others find the answer faster. ]

    Monday, July 29, 2019 6:09 PM
    Moderator
  • Hi there,

    Could you please provide more info about the ask ( "proper lineage tracking" ), to provide a better assistance.

    Thank you.


    [If a post helps to resolve your issue, please click the "Mark as Answer" of that post and/or click Answered "Vote as helpful" button of that post. By marking a post as Answered and/or Helpful, you help others find the answer faster. ]

    Tuesday, July 30, 2019 11:05 PM
    Moderator
  • Hi Saugat,

    From a data flow stand point, Azure data Factory provides some data lineage capability on Azure data Platform from orchestration perspective, please have a look here - https://www.youtube.com/watch?v=5KvqYF-y93s&t=5s.

    Also you might want to look at Azure Data Catalog for some basic  metadata management where in you can register data sources, annotate them etc.., which provides some metadata management capabilities, however as of now Azure data Lake Gen 2 is not supported in Azure Data catalog. 

    Hope this helps!

    Regards,

    Srihad

    Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it . Blog - srihashadari.com

    Wednesday, July 31, 2019 2:55 AM
  • Hi Srihad,

    Sorry for responding late. Thanks for your response. Yes, I also can't see any support for Gen2. We are not using Data Factory at the moment, so though that link is nice, not really applicable for us.

    We really need something like Apache Atlas which seems to do it quite neatly as mentioned here : https://aws.amazon.com/blogs/big-data/metadata-classification-lineage-and-discovery-using-apache-atlas-on-amazon-emr/

    Unfortunately, it seems like for the time being we have to put Key Value pairs for some logical information on these files (though that is far from ideal).

    Friday, August 2, 2019 9:02 AM
  • Hi SaugatMukherjee,

    Please free to share your feedback/suggestion in product user voice forum. All the ideas/feedback shared here will be monitored by the product engineering team and will take appropriate action. 


    [If a post helps to resolve your issue, please click the "Mark as Answer" of that post and/or click Answered "Vote as helpful" button of that post. By marking a post as Answered and/or Helpful, you help others find the answer faster. ]

    Monday, August 5, 2019 7:21 PM
    Moderator