none
How to do incremental copy/parse files from FTP server?

    Question

  • Hi all, I've got a requirement to copy data from a FTP server to Azure blob storage, I've got 2 questions need help:

    1. How to copy incrementally? I see from the document here only  SQL is supporting incremental operation?

    2. If #1 is possible, I'll copy GZ files contains not well formated csv files, can I do some customized parsing logic line by line with data factory?

    Thanks in advance!

    Thursday, May 3, 2018 5:40 PM

Answers

  • Hi Tony,

    FTP doesn't support incremental copy. One workaround is put new files in a new folder whose name contains date time, and let Copy activity copy that folder using Data Factory expression (refer to https://docs.microsoft.com/en-us/azure/data-factory/control-flow-system-variables) to achieve incremental copy. 

    If you want to do some customized parsing logic for csv file, it is suggested to use .NET activity to achieve that.

    Regards,

    Gary

    Monday, May 7, 2018 3:53 AM

All replies

  • Hi Tony,

    FTP doesn't support incremental copy. One workaround is put new files in a new folder whose name contains date time, and let Copy activity copy that folder using Data Factory expression (refer to https://docs.microsoft.com/en-us/azure/data-factory/control-flow-system-variables) to achieve incremental copy. 

    If you want to do some customized parsing logic for csv file, it is suggested to use .NET activity to achieve that.

    Regards,

    Gary

    Monday, May 7, 2018 3:53 AM
  • Hi Tony,

    There's another way to achieve incremental copy in ADF V2. Please take a look the related thread: https://stackoverflow.com/questions/50298122/azure-data-factory-incremental-data-load-from-sftp-to-blob, it is same for FTP.

    Regards,

    Gary

    • Proposed as answer by Gary Zhu Tuesday, May 15, 2018 2:17 AM
    Monday, May 14, 2018 2:18 AM
  • Thanks for the answer!

    Another question is: I see there is option for sink copy behavior as merge files:

    

    But didn't work out with error message as:

    "Activity IncrementalCopy failed: ErrorCode=UserErrorFormatRequiredWithCopyBehaviorMergeFiles,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Format setting is required on both source and sink for 'MergeFiles' copy behavior.,Source=Microsoft.DataTransfer.ClientLibrary,'".

    Do you have dny idea about this?

    Thanks Tony,

    Monday, May 14, 2018 9:16 PM
  • ADF doesn't support direct binary data concat. The "Merge file" behavior is: sources are binary files with tabular data, e.g. CSV, JSON, AVRO format. When doing merge copy, each row of these files will be extracted and write into same sink file. That's why error message requires you to provide format.
    Tuesday, May 15, 2018 2:16 AM