none
File validation and re-send when copying and moving data

    Question

  • Hi Azure Experts,

    I have a potential customer who wishes to use ADF V2 for data integration.

    They want to move and copy very large files over from on-premise server to cloud server and probably use ADF to do the data integration after that.

    I read this article - https://docs.microsoft.com/en-us/azure/data-factory/connector-file-system

    I have the following questions:-

    1. Can ADF copy function perform file validation (MD5, CRC, etc.) of the source and copied files?

    2. In case the network disconnects during file move/copy, can ADF copy function re-send the file?

    Thanks so much in advanced.

    Best regards.



    • Edited by Thomas6565 Monday, August 27, 2018 4:38 AM
    Monday, August 27, 2018 4:36 AM

All replies

  • Hello,

    1.  Yes, the Get Metadata activity supports MD5:

    https://docs.microsoft.com/en-us/azure/data-factory/control-flow-get-metadata-activity

    2.  Yes, pipeline activities support configurable retry attempts:

    https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipelines-activities#activity-policy

    I would recommend taking a look at the documentation on Copy Activity performance tuning.  You may find it beneficial to test features such as Parallel Copy, Staged Copy, and scale out IR:

    https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-performance

    I hope this helps! :)


    Monday, August 27, 2018 5:18 PM
    Moderator
  • Hi Jason,

    Thanks for the links.

    I'm sorry I don't understand the table. In the link https://docs.microsoft.com/en-us/azure/data-factory/control-flow-get-metadata-activity, it shows a cross in column "contentMD5", row "File System". Only column "contentMD5", row "Azure Blob" has a tick.

    Does this mean ADF is unable to perform MD5 hash sum check when transferring normal physical files? My customer would like to transfer all sorts of files - CAD, Excel, txt, database backups, BLOBs, etc..

    Thanks again.

    Cheers,

    Thomas

    Tuesday, August 28, 2018 1:47 AM
  • Hi Thomas,

    Thanks for pointing this out, MD5 is only supported from objects from Azure blob storage. 

    All file types you mentioned would be supported if the were stored in Azure blob, but you mentioned your source files are from on-premises.  

    If you are moving a large amount of data, you might also want to take a look here for performance tuning tips:

    https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-performance



    Tuesday, August 28, 2018 7:05 PM
    Moderator