none
Copy incremental data from database and blob storage in one pipeline RRS feed

  • Question

  • I want to create a pipeline to copy incremental data from database and blob storage, I find a way for database part, but could not find a proper method for blob part, all tasks must be implement in one pipeline, so I could not use event based trigger for that pipeline, or the pre task will also be redo when new blob object being found, I tried to give parameter to file path(eg. container/@{dataset().folderNamedByDate}), but failed, could anybody have solution for my case? Thanks a lot.
    Thursday, September 13, 2018 6:08 AM

All replies

  • If you are using schedule trigger,  you could pass @trigger().scheduledTime to you pipeline parameter and then dataset parameter.

    If you are using tumbling window trigger, you could use @trigger().outputs.windowStartTime.

    Please take a look of the doc

    https://docs.microsoft.com/en-us/azure/data-factory/how-to-create-tumbling-window-trigger

    https://docs.microsoft.com/en-us/azure/data-factory/how-to-create-schedule-trigger

    Thursday, September 13, 2018 11:06 AM
  • I can not use @trigger().scheduledTime or @trigger().outputs.windowStartTime, because I placed daily incremental file in a folder named [yyyy-MM-dd], so I set a pipeline parameter to receive that value, and setting that value to utcnow('yyyy-MM-dd') in trigger, but I got an error showed that utcnow is not a built-in function, so I tried give a fixed value (eg.'2018-09-12') when triggered, but another error occurred [The required Blob is missing], but my container do have a folder named 2018-09-12, when I do not use parameter, and set fixed value to file path like[container/folder], then the pipeline can work properly, could you tell me how to correctly use parameter to solve my problem? Thanks!
    Tuesday, September 18, 2018 8:45 AM
  • 1. You could use copy data tool to help you build the pipeline. It will help you generate the parameter part.

    Please reference this post. and this post.

    2. The expression you need is in the following format. You need create a pipeline parameter first and then pass the trigger value to it.

    @{formatDateTime(pipeline().parameters.windowStart,'yyyy-MM-dd')}

    3. This the official doc for write partition data.

    Tuesday, September 18, 2018 2:22 PM
  • Assume you want to copy form database to blobA and then copy from blobA to blobB.

    1. You need a pipeline contains two copy activities. And your pipeline should have two parameters which would also be used in your query. 

    2. Then you could create a dataset for your blobA, lets name it dataset_blob, which will be used as the sink dataset of the first copy activity, and the source dataset of the second copy activity. (You could also create two datasets, one used for the sink of first copy activity, another used for the source of the second copy activity. Most of them should be the same, for example, they could reference the same linked service) And the dataset have a parameter to accept the file name.



    3. Then in your copy activity, pass value to the dataset parameter.


    Wednesday, September 19, 2018 2:16 AM
  • And as you said the first part has been resolved, I assume you knows how to use trigger and pass trigger value to pipeline parameters.

    Wednesday, September 19, 2018 2:25 AM