none
Azure Pipeline to load blob file to Azure SQL database where file name is dynamic

    Question

  • Hello All,

    Can you help me for the following 

    I have two files in same Blob 

    2018-10-23_Depot

    2018-10-23_DockType

    Both file data should go in two different tables (Table name - Depot , DockType)

    Date is changing every time. but remaining name of the file is always same , It do not change

    How to load these files through ADF pipeline and how to pass file name as "Depot" and "DockType" in pipeline

    Quick help is highly appreciated 

    Thanks 

    Regards,

    Mahesh


    Mahesh Dadhich

    Tuesday, October 23, 2018 9:40 AM

All replies

  • Hi there,

    Please follow the below steps:

    1. Create an azure blob dataset with a parameter named "filename" in it and pass parameter to the blob fileName property.

    2. Similarly, create an azure sql dataset with a  parameter named "tablename" in it then pass parameter to tableName property.

    3. Create a pipeline with a for each activity, in for each activity contains a Copy activity with the above blob dataset as source and the azure sql dataset as the sink.

    4. In the pipeline, create a parameter named "sourceAndSinkItems" with the type of array and the default value as

    [{"source":"2018-10-23_Depot","sink":"Depot"},{"source":"2018-10-23_DockType","sink":"DockType"}]"

    pass it to the property "items" of the for each activity.

    5. In copy activity, pass

    @item().source

    to the "filename" of the azure blob dataset and

    @item().sink

    to the "tablename" of the sql dataset.

    6. Trigger pipeline run.

    The below is the sample pipeline/source dataset/sink dataset json definition:

    Pipeline:

    {
        "name": "pipeline38",
        "properties": {
            "activities": [
                {
                    "name": "ForEach1",
                    "type": "ForEach",
                    "typeProperties": {
                        "items": {
                            "value": "@pipeline().parameters.sourceAndSinkItems",
                            "type": "Expression"
                        },
                        "activities": [
                            {
                                "name": "Copy Data1",
                                "type": "Copy",
                                "typeProperties": {
                                    "source": {
                                        "type": "BlobSource",
                                        "recursive": true
                                    },
                                    "sink": {
                                        "type": "SqlSink",
                                        "writeBatchSize": 10000
                                    }
                                },
                                "inputs": [
                                    {
                                        "referenceName": "AzureBlob1",
                                        "type": "DatasetReference",
                                        "parameters": {
                                            "filename": {
                                                "value": "@item().source",
                                                "type": "Expression"
                                            }
                                        }
                                    }
                                ],
                                "outputs": [
                                    {
                                        "referenceName": "AzureSqlTable2",
                                        "type": "DatasetReference",
                                        "parameters": {
                                            "tablename": {
                                                "value": "@item().sink",
                                                "type": "Expression"
                                            }
                                        }
                                    }
                                ]
                            }
                        ]
                    }
                }
            ],
            "parameters": {
                "sourceAndSinkItems": {
                    "type": "Array",
                    "defaultValue": [
                        {
                            "source": "2018-10-23_Depot",
                            "sink": "Depot"
                        },
                        {
                            "source": "2018-10-23_DockType",
                            "sink": "DockType"
                        }
                    ]
                }
            }
        }
    }

    blob dataset:

    {
        "name": "AzureBlob1",
        "properties": {
            "linkedServiceName": {
                "referenceName": "azurestorage1",
                "type": "LinkedServiceReference"
            },
            "parameters": {
                "filename": {
                    "type": "String"
                }
            },
            "type": "AzureBlob",
            "typeProperties": {
                "format": {
                    "type": "TextFormat",
                    "columnDelimiter": ",",
                    "nullValue": "\\N",
                    "treatEmptyAsNull": true,
                    "skipLineCount": 0,
                    "firstRowAsHeader": false
                },
                "fileName": {
                    "value": "@dataset().filename",
                    "type": "Expression"
                },
                "folderPath": "aaa"
            }
        },
        "type": "Microsoft.DataFactory/factories/datasets"
    }

    azure <g class="gr_ gr_2959 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del multiReplace" data-gr-id="2959" id="2959">sql</g> dataset

    {
        "name": "AzureSqlTable2",
        "properties": {
            "linkedServiceName": {
                "referenceName": "azuresql1",
                "type": "LinkedServiceReference"
            },
            "parameters": {
                "tablename": {
                    "type": "String"
                }
            },
            "type": "AzureSqlTable",
            "typeProperties": {
                "tableName": {
                    "value": "@dataset().tablename",
                    "type": "Expression"
                }
            }
        },
        "type": "Microsoft.DataFactory/factories/datasets"
    }

    Hope it'll help.



    Tuesday, October 23, 2018 2:54 PM
  • Hi Wang,

    I am sorry but I did not understand how to create parameter and what is thsi for Filename parameter

    <g class="gr_ gr_288 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del multiReplace" data-gr-id="288" id="288" style="font-size:0.75em;">azure</g>

    I am trying to pass file name dynamically in Input blob data set .

    "fileName": { 
    "value": "@concat(formatDateTime(utcnow(),'yyyy'),'-', formatDateTime(utcnow(),'MM'),'-',formatDateTime(utcnow(),'dd'), '-','Depot.CSV')"
    "type" : "Expression"
    },

    It provision the data set successfully but Pipeline get fail and saying that

    Failed to get blob reference from storage service

    I am trying to make format "(yyyy-MM-dd)-Depots.CSV where yyyy-MM-dd is current date.

    Can you suggest and help If I am doing something wrong. 


    Mahesh Dadhich

    Tuesday, October 23, 2018 5:17 PM
  • Does utcnow() refer to current date ? If not then which function should be used for current date

    Mahesh Dadhich

    Tuesday, October 23, 2018 5:41 PM
  • Hi there,

    • I am sorry but I did not understand how to create parameter and what is thsi for Filename parameter

    To pass dynamic value to filename of the blob dataset, you could either pass an expression (like "@concat(...)") or a dataset parameter to filename. Consider in the above example, we need the filename value stored in the "items" array in for each activity, so it's necessary to create a dataset parameter (here we name it "filename" )in dataset, refer it as "fileName" property and then pass value to this parameter in copy activity. The data flow of the filename should be: for each activity->copy activity->dataset. See the picture below:

    • Does utcnow() refer to current date ? If not then which function should be used for current date

    Yes, utcnow() returns the current date, and I didn't see anything wrong in expression "@concat(formatDateTime(utcnow(),'yyyy'),'-', formatDateTime(utcnow(),'MM'),'-',formatDateTime(utcnow(),'dd'), '-','Depot.CSV')". Please confirm the file in this pattern does exists in the blob. Thanks.


    • Edited by Wang Zhang Wednesday, October 24, 2018 2:11 AM
    Wednesday, October 24, 2018 2:10 AM
  • HI Wang,

    Really Appreciated your help  !

    I am not using Data Factory v2 hence I do not have wizard or Pipeline editor  as I am using Data Factory v1. So can you suggest the solution as per V1 where I am writing JSON 


    Mahesh Dadhich

    Wednesday, October 24, 2018 12:35 PM