locked
Dynamic date-based fileFilter in an sFTP dataset RRS feed

  • Question

  • We work with a 3rd party vender that has given us access to data that we use for our warehouse.  I have already set up the linkedServices, Datasets, and Pipelines and it all works great during our development phase.

    The problem now is that this vendor (Very large vendor that services tens of thousands of clients this way) puts ALL the zip files (each zip file contains a single csv)  we need in a folder and they are date based, also they leave previous data files in there for an undetermined period of time.  We only have read access to this folder.

    So, for example:

    root/Output may contain something like this:

    csvFileData_areaA_2017_11_15.zip
    csvFileData_areaB_2017_11_15.zip
    csvFileData_areaC_2017_11_15.zip
    csvFileData_areaD_2017_11_15.zip
    csvFileData_areaA_2017_11_16.zip
    csvFileData_areaB_2017_11_16.zip
    csvFileData_areaC_2017_11_16.zip
    csvFileData_areaD_2017_11_16.zip

    So when I run my pipeline on the 16th of the month, I only want it to grab files based on the 16th.

    I was thinking of using the fileFilter but it would have to match the date for the run.

    Something like:  csvFileData_areaD_{year}_{month}_{day}.zip  - if this is even an option, but then I was also worried that if the process broke on the weekend, or didn't run or whatever, it couldn't pick up files from previous day(s) that were missed.  So I guess, is there a way to pull in a parameter or something?  Or somehow keep track of the ones that we already got and processed via FTP?

    I have been doing the creation of my stuff through JSON scripts, so I don't really see a way.

    Thanks.

    Wednesday, November 29, 2017 3:12 PM

All replies

  • I think I might have answered my own question, is it possible to use partitionBy with fileName?  Something like this?

          "filename": "csvFileData_*_{Year}-{Month}-{Day}.zip",
          "PartitionBy": [
            {
              "name": "Year",
              "value": {
                "type": "DateTime",
                "date": "SliceStart",
                "format": "YYYY"
              }
            },
            {
              "name": "Month",
              "value": {
                "type": "DateTime",
                "date": "SliceStart",
                "format": "MM"
              }
            },
            {
              "name": "Day",
              "value": {
                "type": "DateTime",
                "date": "SliceStart",
                "format": "dd"
              }
            }
          ],

    Wednesday, November 29, 2017 3:51 PM