locked
Gzip to Parquet conversion in ADF RRS feed

  • Question

  • I was able to convert uncompressed files in ADLS Gen2 (Delimited) into parquet format writing into another file system. However, I was trying to do the same with .gz files and convert the format as parquet in destination and it is not allowing me to. Can you please help if this can be done in ADF and how to proceed ?

    (I select binary copy when processing a .gz and then in the destination, it has to be a similar compression technique and not parquet. If I dont select binary copy, it tries to read the schema which it will not be able to).

    Any help is greatly appreciated.

    Wednesday, November 13, 2019 4:11 PM

All replies

  • Hello , 

    Since you are trying the source as *.gzip file and the destination as parquet you are seeing this issue . Binary copy is just a simple copy with no mapping . I suggest you to please use 2 copy activity , use the fist one to un-compress the gzip file and in the second copy activity transform the file to parquet .

    Let me know how it goes .


    Thanks Himanshu

    Thursday, November 14, 2019 11:14 PM
  • Use dataflow built for scale to handle large Parquet files on the lake.
    Friday, November 15, 2019 7:08 PM
  • Hello azdevad1 ,

    We have not heard back from you on this and was just following up ,if the issue is resolved .In case if you have a resolution we request you to share that with the community .


    Thanks Himanshu

    Monday, November 18, 2019 8:14 PM