none
Azure loading files from Blob Storage that dont contain co

    Question

  • Hi,

    Im new to data factory and Ive noticed that all of my examples for blob storage source data are for files that either sit in daily or monthly folders. Or data that is refreshed so data factory is taking for example hourly slices.

    My situation is, we get files of data, (These files can be replaced with new files with updated metrics.

    I just want to load the data into a snapshot table

    Once loaded these files are removed from the folder ready for a new batch to be added

    I'm not sure how to deal with this specific data example in Azure DF. What would the Start and End be in the pipeline. What would the interval and frequency be set to?


    Debbie

    Monday, October 15, 2018 3:19 PM

All replies

  • I would love to help but don't quite understand the question you're asking.  Can you add more clarity to what you're trying to do with Blob storage via ADF?  Are you using ADFv1 or ADFv2?
    Monday, October 15, 2018 6:22 PM
  • Hi,

    Based on your requirement you can have first copy data task and then a custom task in your pipeline to remove those files. Start Date, End date, frequency are use to trigger/schedule the pipeline and should be based on your requirement. 


    Cheers,

    Tuesday, October 16, 2018 3:21 AM
  • Let me have a go at attempting to explain this better

    We download 5 files and store them in a Sharepoint folder. Each file is a months worth of data

    The files contain metrics and these metrics can change every time we download them. So If the current month is October, we download October, September, August, July and June and there could be changes in all the files.

    So currently I truncate a staging table and import all the records into this staging table.

    Then the files get moved (Possibly into a dated historical folder)

    Stored Procedures do the rest of the ETL processing to move the data into the main table.

    More files get uploaded to the Sharepoint folder and the process begins again.

    When we get to November, the 5 files will be for November, October, September, August and July

    I would like Data factory to run the process every night. This is easy in Integration Services but I’m lost with data factory.

    The files are added by users so if there are no files in the folder, nothing should happen

    In this case I don’t understand the Interval or start and End Dates because I just want it to load files if they are in the folder

    I havent started yet so whichever one is better for this ADFv2 or ADFv1?

    Any help would be greatly appreciated. Im currently searching for examples for this specific type of import but cant find anything


    Debbie

    Tuesday, October 16, 2018 8:30 AM
  • THis is quite a good one I think looking at your info here https://docs.microsoft.com/en-us/azure/data-factory/how-to-create-schedule-trigger

    Debbie

    Tuesday, October 16, 2018 8:32 AM
  • If it's a new project, use ADFv2.  It's the future of the product and has much more functionality being built into it.

    Do you have a timestamp column to load incrementally?  Or do the metrics in the previous month's files also change nightly or monthly?  

    You can create a Trigger in ADF based on a schedule (in your case nightly) and you don't need to do much else.  Here's an example setup for the Trigger you would want.

    • Name: YourTrigger
    • Description: YourDescription
    • Type: Schedule (options are schedule, tumbling, event) Alternatively, you might want to use Event and base it around if the file exists or not.  This will require you to delete the file when you're done processing. https://docs.microsoft.com/en-us/azure/data-factory/how-to-create-event-trigger
    • Start Date: First date you want it to begin.  Note that the time you set this to will actually be the time it runs every day
    • Recurrence: Daily - Every: 1 Day(s)
    • Advanced Recurrence options: Leave blank
    • End: No End
    • Activated: Checked
    • Save

    Then when you create the pipeline, there will be an option for "Trigger" that you can select New/Edit.  Then you can add the trigger to the pipeline and it will run nightly at your specified time.


    Tuesday, October 16, 2018 1:52 PM