locked
Compress CSV file through U-SQL script RRS feed

  • Question

  • Hello all!

    I've a question about compression and CSV files. Let me first take a short description of the current problem. We have managed to set up IoT HUB with BLOB storage behind (Avro compressed CSV files are stored). Then we are using Azure Data Factury Triggers to aggregate data via U-SQL scripts and then create CSV files. From time to time the files are quite big, so my question is, is there any option to compress files in U-SQL scripts. As we've tested, the difference compressing such files with gzip or similar is quite big.

    My thougts are..if U-SQL scripts supports C# custom methods and functions, why there would not be an option to call a method on OUTPUT statement, which would compress a output file. Or create another job which will take care of compressing CSV files and moved them on another location in BLOB / Data Lake Store?

    Thank you

    Friday, June 22, 2018 7:15 AM

All replies

  • Hi!

    Have you considered using a custom activity in Data Factory:

    https://docs.microsoft.com/en-us/azure/data-factory/transform-data-using-dotnet-custom-activity

    As far as U-SQL goes, I see an EXTRACT expression but I do not see any references to compression:

    https://msdn.microsoft.com/en-us/azure/data-lake-analytics/u-sql/extract-expression-u-sql 

    Friday, June 22, 2018 7:30 PM
  • Hello Jason_J!

    Thank you for your suggestion and answer. If i understand you correctly, the way to go is:

    1. Azure Data Factory custom activty

    2. Implementing a method for compression, which will actually "duplicate" the files. So the uncompressed CSV files remains, together with compressed versions.

    This is maybe the way to go. What do you think!

    Thank you

    Tuesday, June 26, 2018 10:15 AM