Azure Data Factory (v2?) On Prem Pull last 7 days to activity which uses Data Lake and U-SQL to process the 7 days worth of files


  • I have many on prem files that are mostly in the format YYYYMMDD.csv.... We are trying to use ADF or ADFv2 to pull these files ultimately into an Azure SQL Database for display on a power bi dashboard. The requirement is that the files should be pulled every 15 minutes. The gotcha is that these files may not appear / get generated for several days after the date of the file... so 20180101.csv might not be seen until 20180103.csv...  So the thought was we would pull the past 7 days worth of files each 15 minute interval to make sure we are covered. The problem I am having is that:

    1) Copy activity pulls all files from a folder, or a wildcard ie 201801{*}.csv, or {Year}{Month}{Day}.csv using parameters. But how do I explicitly provide a "range" of files ie the past 7 days... My understanding is that ADF can't look at the timestamp of the file (so is there an Azure Function or something else I have to use?)

    2) For Data Lake Analytics and u-sql this uses the @In and @out parameter, which must be a valid "hardcoded" path and file. I tried the "u-sql multifile" example, but this returns errors seeming that it should only be acting on one file at a time... Note I am also using the cleanup script to delete any data already in the database for the YYYYMMDD...

    I have this working with the current window ie current day... but going back 7 days escapes me... Any ideas?

    I had thought of using ADFv2 and trying the for-each with a query to return a list of days, but it looks like it will get messy fast. What am I missing?


    Monday, June 11, 2018 7:26 PM

All replies