none
ADF V2 - Managing Datasets RRS feed

  • Question

  • I'm getting familiar with ADF V2, as my company is migrating everything to Azure.  If I am not mistaken, a dataset is very specific to the procedure/query/table that is used in each Activity (for each Source and Destination).  In a Data Warehouse, there will be a very large number of Activities/Datasets.  How do you manage these? 

    melissalevitt@hotmail.com

    Monday, June 4, 2018 7:54 PM

All replies

  • You are right that a dataset is very specific to the procedure/query/table that is used in each Activity. But in ADFV2, you could use expressions. Then you don't need create a new dataset for each table. Instead, you could reuse them. 

    For example, you create a dataset parameter named tableName. Then in your dataset tableName field, you just put @dataset().tableName. When using it pipeline, you pass the real value to it in a foreach loop.

    You could also take a look of foreach activity.


    Tuesday, June 5, 2018 2:33 AM
  • Thank you!!

    Is there any additional documentation on creating these expressions (as their passed from one acitivty or pipeline to another)?  The above referenced link just does not have enough.  


    • Edited by zl34 Wednesday, June 6, 2018 6:39 PM
    Wednesday, June 6, 2018 6:33 PM
  • In some situations you can use a generic dataset. I hate having a dataset for every table that is crazy. For example the json for this dataset:

    {
        "name": "GenericDS_input_auto",
        "properties": {
            "linkedServiceName": {
                "referenceName": "server1",
                "type": "LinkedServiceReference"
            },
            "type": "SqlServerTable",
            "typeProperties": {
                "tableName": "PlaceHolderTable"
            }
        }
    }

    If you used the above DS with a copy activity with SQL server as the source and you specify the source as a custom query it would just ignore that table name. So you can actually just use this 1 DS for all tables in that source. You will just need to have a custom source query for each copy activity.


    Kyle Clubb

    • Proposed as answer by Kyle Clubb Wednesday, June 6, 2018 8:04 PM
    Wednesday, June 6, 2018 8:03 PM
  • Please reference this tutorial.

    https://docs.microsoft.com/en-us/azure/data-factory/tutorial-bulk-copy-portal


    Thursday, June 7, 2018 4:44 AM