locked
using subset of output dataset as input for next step in pipeline RRS feed

  • Question

  • Hello,

    I have a pipeline with multiple steps as follows:

    1. Copy Data to Blob
    2. Convert Data from XML to JSON
    3. Copy Data to SQL

    The second step is a custom activity which which writes the converted data to a different blob container. However, The input for this step contains different data which must me inserted into different tables in SQL. Is there an easy or out of the box way to filter the output dataset of step 2 so that it can be used as input dataset for step 3?

    The filename structure is MyFileTypeName_yyyy-MM-dd-hh-mm-ss.json

    Cheers

    Tom 


    Have a look at my mobile games on http://www.blugri.com

    Wednesday, August 16, 2017 9:15 AM

All replies

  • well, you would simply have multiple output datasets for your custom activity

    each dataset then references a different path on the blob-store which can then be used as input for your 3rd step

    -gerhard


    Gerhard Brueckl
    blogging @ http://blog.gbrueckl.at
    working @ http://www.pmOne.com

    Wednesday, August 16, 2017 11:49 AM
  • Hi Gerhard,

    thanks for the response. But How would I partition the output dataset based on the file name. Is this a standard behaviour or do I need a custom activity?

    Cheers

    Tom


    Have a look at my mobile games on http://www.blugri.com

    Wednesday, August 16, 2017 2:16 PM
  • thats a standard feature of the Blob Dataset

    you can specify a root-path or a file-pattern
    the root-path also supports partitions by date/time


    Gerhard Brueckl
    blogging @ http://blog.gbrueckl.at
    working @ http://www.pmOne.com

    Thursday, August 17, 2017 9:48 AM
  • Hi Gerhard,

    I think that there is a misunderstanding in my question. I do not want to filter/partition on the date, but I want to partition on the part of the file name before the date. I have a folder with multiple files:

    FileNameTypeX_20170817.xml

    FileNameTypeY_20170817.xml

    ...

    I want to have all files that start with FileNameTypeX... in one output dataset, and all files that start with FileNameTypeY... in another output dataset.

    Cheers

    Tom


    Have a look at my mobile games on http://www.blugri.com

    Thursday, August 17, 2017 9:59 AM
  • well, if you have a custom activity you can also write the different files types into different folders and reference them in the dataset

    Gerhard Brueckl
    blogging @ http://blog.gbrueckl.at
    working @ http://www.pmOne.com

    Thursday, August 17, 2017 12:09 PM