none
ADF file copy

    Question

  • Dear ADF Experts,

    I can't find any info. on this. My potential customer is looking into copying a lot of files at a very short interval. Is there a transfer limit on the copy function (Copy Activity)? Example, 10 copy activities per minute? What could happen if they schedule to copy 10 relatively large files every 5 secs? Do we need to fine tune the performance manually?

    Thanks,

    Thomas

    Tuesday, August 28, 2018 1:51 AM

All replies

  • first lets make it clear that "Copy" Activity is "Copy Data" Activity

    it copies "data" not copy files, basically it opens a file reads it line by line or a bulk and created a file at the destination then injects the data to the destination file then it saves the destination file,

    To copy or move files you need to use Azure Logic App (ALA) in Azure Data Factory (ADF)

    https://azure.microsoft.com/en-ca/resources/templates/101-logic-app-ftp-to-blob/

    OR YOU CAN USE.

    Azure FUNCTIONS

    https://cmatskas.com/copy-azure-blob-data-between-storage-accounts-using-functions/


    I prefer Logic App


    Sincerely Nik -- Please kindly mark the post(s) that answered your question and/or vote for the post(s). http://sqldataside.blogspot.ca/ (SQL Tabular + PowerShell)

    Tuesday, August 28, 2018 11:39 AM
  • Hi Thomas,

    ADF currently supports three kinds of trigger: schedule trigger, tumbling window trigger and blob event trigger.

    With schedule trigger, the highest frequency is 1 minute. With tumbling window trigger, the highest frequency is 15 minutes. Blob event depends of the file creation/deletion event. Therefore, I think 5 seconds might not be possible except blob event trigger. 

    There is a performance tuning tutorial in case you want to take a look.
    Tuesday, August 28, 2018 12:35 PM
  • Well, no, there's no limit on the number of Copy Activities you can run.  They'd just have to pay for it.

    However, "10 relatively large files every 5 secs" sounds like you have a different issue.  Depending on 'large' I'd say that pattern isn't sustainable and they're trying to use a file mechanism for a real-time integration which...probably won't work the way the want.

    Why do they think they need this.  If you explain the situation, maybe we can recommend a more suitable pattern.

    There are no manual performance tuning option.

    Tuesday, August 28, 2018 12:42 PM
  • Hi there,

    When the activities are happening very frequently, and performance is important, it may be a good use case for always-on Azure Functions or Logic Apps as stated.

    An Azure Batch account and an ADF custom activity could also be a solution that performs well with some parallelism and meets the requirements.  However, if the job is running for a long time or frequently, it may not be a good candidate for ADF.  An always-on solution might be a better option.

    cheers,

    Andrew


    Andrew Sears

    Wednesday, August 29, 2018 2:20 AM
  • I feel the scenario is moving towards Streaming data.

    Sincerely Nik -- Please kindly mark the post(s) that answered your question and/or vote for the post(s). http://sqldataside.blogspot.ca/ (SQL Tabular + PowerShell)

    Monday, September 3, 2018 12:56 PM