locked
ADF integration with DevOps GIT RRS feed

  • Question

  • Hi, 

    We just integrated ADF with DevOps Git.

    We have observed that folder structure created in ADF is not reflected in corresponding Devops Git repo. As an example, I create a folder called Proj_1 in ADF and creates a pipeline "test1" under that. When I saves this in Git then I cannot see that folder in ADF in Git. The "Proj_1" folder is considered as a property under "test1" pipeline.

    Please confirm my understanding is correct.

    If above is correct then I am assuming ONLY one project (example Finance datamart) related pipelines , data set and connections needs to be integrated with ONE Repo only. Please correct me if I am wrong.

    Then, if some of those connections and pipelines are needed by another project (e.g. Marketing data mart) then we need to create same pipe lines and data set AGAIN by connecting with another different Repo for this project . Please correct me if I am wrong.


    Thursday, May 2, 2019 9:43 PM

Answers

  • I looked at the developer settings (network) on the browser when setting up git repo on ADF GUI and encountered HTTP GET 403 errors which I believe caused this issue. It's resolved now (I think it was a network security issue which was fixed subsequently)
    Wednesday, August 14, 2019 6:33 PM

All replies

  • Hello Shafi Data, and thank you for your inquiry.  I have confirmed in my own non-DevOps Git repo the same behavior.  The folder structure visible in the Factory GUI is not visible in the Git repo.  Since the GUI folders persist between sessions, this suggests that the folder information is stored elsewhere.

    The ramifications, and how this affects your projects, I am not clear on.

    Tuesday, May 7, 2019 6:23 PM
  • Upon closer inspection of the pipeline json files, I find the folder information is stored inside the pipeline file, towards the end of the file.

    The nesting appears to be ...properties.folder.name

    I am not an expert in Git.  One time I used something called submodule which allowed me to include one repo inside another repo.  Perhaps there is some feature which will allow selective synchronization of files between repos, or some other solution.
    Tuesday, May 7, 2019 6:29 PM
  • Hi  MartinJaffer-MSFT,

    Thanks for response. I am elaborating a ADF usage scenario. Please let me know if this is a valid scenario or not.

    So, lets say, I have 3 projects  (e.g. Sales, Marketing , Finance). So, I created 3 folders named Sales, marketing and Finance in my Dev ADF instance and created required pipelines. All of these pipelines share same connection.

    Do you recommend such approach in ADF development (i.e. multiple project pipeline under same ADF) or do you recommend different ADF for different projects?

    Now, coming back to "ADF integration with Devops GIT", do you recommend that it is standard to maintain separate Git repository for different project?

    Example:

    Lets say we developed 10 pipelines to process data for my "Finance" project. So, all of these pipelines should be integrated a project specific repository (e.g. FinanceRepo)?

    Now, we developed 6pipelines to process data for my "Marketing" project. So, all of these pipelines should be integrated a project specific repository (e.g. MarketingRepo)?

    Please correct me if I am wrong.


    Shafi

    Tuesday, May 7, 2019 10:48 PM
  • I am unaware of any standards involving both ADF and DevOps.

    My colleague and I have differing opinions on what the best option is.

    If your projects are closely related or tightly coupled, then a single combined repository & Data factory makes sense.  However if one of your projects decides to migrate away from the others, the change will require more work.

    I personally favor having a combined or hybrid solution.  Since all your pipelines share the same connection, this means they use the same resource.  If everything is in the same Data Factory, then you can coordinate your triggers so that jobs run consecutively, not concurrently.  You can better orchestrate the load on your resources.

    My colleague is in favor of separate factories and repos, and making everything as atomic as possible.

    While we disagree, the decision ultimately comes down to what is best for your specific situation.

    Tuesday, May 7, 2019 11:34 PM
  • I'm facing an issue while trying to integrate my ADF with Azure DevOps Repo. Once I set up the code repository on ADF UI, I see all the folders created in my Azure DevOps repo (pipelines, datasets, linkedsets etc) but on my ADF UI, all my objects disappear (pipelines, datasets etc). When I switch back to DataFactory mode, I'm able to view them. Also, I did not face this issue when I setup ADF code repo on Enterprise github instead of Azure repo. It worked as expected.
    Monday, July 29, 2019 4:00 PM
  • Hi Sunita,

    I am merging this thread with another thread with the same exact issue. Someone should respond on the thread very soon to resolve the issue.

    Tuesday, July 30, 2019 9:26 AM
  • @Sunita Sarma, it sounds like your did some work in your data factory, creating resources and pipelines, etc.  THEN you integrated with a repo.  I hope the following explanation will resolve confusion.

    The authoring process is different when there is, or is not, a repo integrated.

    When there is no repo... and you open the Data Factory UI, the UI fetches the existing, published, definitions.  When you publish, these definitions are updated.

    When there is a repo... and you open the Data Factory UI, the UI fetches definitions from the repo.  When you click 'save', these definitions are updated.  When you delete a resource, the deletion is immediately committed to the repo (even without clicking 'save').  When you click 'publish', an ARM template artifact is created in the repo, and then the 'published definitions' are written to.

    When powershell is used to alter resource... the 'published definitions' are updated, but any repo is ignored.

    When you integrated with the repo, the existing, published definitions were not imported into the repo.  Thus there is a difference between the 'Data Factory view' and the 'Repo view'.  One is looking at the published definitions, the other is looking at the repo definitions.

    Wednesday, July 31, 2019 1:16 AM
  • Correction:

    Upon creating the integration (not modifying existing), the existing, published contents of the Data Factory should be written to the repo.  Please tell me if this is not the case.  If you already are integrated, try removing the configuration, and then setting the configuration up again.

    Wednesday, July 31, 2019 10:46 PM
  • @MartinJaffer, upon integration with DevOps Git, the contents of the datafactory are written to the repo successfully but on the UI, it doesn't pick up the definitions from the repo for some reason. The ADF UI shows up empty with no pipelines or datasets. 
    Thursday, August 1, 2019 2:20 AM
  • Sometimes when I open the UI, it takes a minute to load.  Failing that, I would check which branch the UI is displaying, and then check the browser console for error messages or failed network requests.  If none of these give hints as to the cause, I could reach out internally, or refer you to customer support.

    It helps to know that this looks to be more of a UI issue than writing-to-repo issue.  There are a number of other tests I would like to try, but for now, while you check the above suggestions, I will search for similar cases.

    Thursday, August 1, 2019 7:00 PM
  • I have caught news of a similar issue.  Let me know if you wish to be updated.
    Monday, August 5, 2019 6:57 PM
  • @MartinJaffer, Thank you for your response. I raised a support ticket with MSFT through the portal and they have forwarded the ticket to ADF product team for further investigation. Meanwhile, if you have come across a similar issue/resolution, I would like to hear about it. I did double check the branch on the UI and also did not come across any error messages. THe ADF activity log wasn't super useful in this case either.
    Tuesday, August 6, 2019 1:34 PM
  • I looked at the developer settings (network) on the browser when setting up git repo on ADF GUI and encountered HTTP GET 403 errors which I believe caused this issue. It's resolved now (I think it was a network security issue which was fixed subsequently)
    Wednesday, August 14, 2019 6:33 PM
  • Thank you very much for the insight Sunita.  Your contribution benefits us all.  I will recommend your course of action (checking browser developer view) to others who encounter the same issue.
    Thursday, August 15, 2019 1:08 AM