none
Compression and decompression while copying on premise file to azure data lake

    Question

  • Hello,

    I am trying to copy 100 GB data set from on premise file system to Azure lake and for that I have created Linked services and data set for input and output which are linked by the pipeline. But it copies the data very slowly and sometimes the activity stops coping process, I don't know why. 

    Later, for better performance and speed up the copy activity, I have applied compression on input and output data set but it gave me error mentioned below:

    Copy activity encountered a user error: ErrorCode=UserErrorAdlsFileWriteFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Writing to 'AzureDataLakeStore' failed. 'One or more errors occurred.',Source=Microsoft.DataTransfer.ClientLibrary,''Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The source data has an invalid format. Cannot decompress the source data.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.IO.InvalidDataException,Message=The magic number in GZip header is not correct. Make sure you are passing in a GZip stream.,Source=System,''Type=System.OperationCanceledException,Message=The operation was canceled.,Source=System,''Type=System.OperationCanceledException,Message=The operation was canceled.,Source=mscorlib,''Type=System.OperationCanceledException,Message=The operation was canceled.,Source=System,'.

    Now, I am not sure what to do with it so that I can speed up the process. I want to speed up the process by running only 1 activity in pipeline. 

    What else I can do with this or other alternative?


    Thanks, Manthan Upadhyay

    Friday, March 4, 2016 6:50 AM

All replies

  • Hi Manthan,

    Let's first understand why your uncompressed data is transferring slowly.

    You mentioned you have an on-prem file system, correct? The transfer speed is of course influenced by many factors i.e. the ADF gateway machine spec, the internet connection you have etc. In our tests, we have been able to get around 100MBps copy speeds for certain file profiles with beefy gateway machines. So, if you could tell us what your copy speed was and help with identifying what the bottleneck is, we can figure out how to speed up your copying.

    Thanks,
    Sachin Sheth
    Program Manager
    Azure Data Lake

    Saturday, March 5, 2016 12:24 AM