Can we process multiple files in chronological order from U-SQL?


  • Can we extract data from multiple files in a sequence order? Suppose In folder I have file SearchLog_20171101.csv,SearchLog_20171103.csv,SearchLog_20171102.csv,SearchLog_20171104.csv,SearchLog_20171105.csv. I need that my SQL script first process ,SearchLog_20171101.csv, then SearchLog_20171102.csv,then it possible in U-SQL?


    Friday, November 17, 2017 11:41 AM

All replies

  • Hi Pawan

    U-SQL is a declarative script that will try to scale out your processing to the available resources (specified parallelism). 

    Thus, you can write something like:

    @data = EXTRACT ..., date DateTime FROM "SearchLog_{date:yyyy}{date:MM}{date:dd}.csv" USING Extractors.Csv();

    However that will process all data "at the same time". Why do you need it in sequence order?

    Michael Rys

    Friday, November 17, 2017 10:36 PM
  • Hi Michael,

    Thanks for your response.It is requirement to process the files in sequential  order due to some Business Processing Rule.

    I am trying to avoid ADF for this process, if we can achieve this from U-SQL script.  But as per my understanding we can't do with U-SQL.

    Monday, November 20, 2017 6:07 AM
  • Hi Michael

    You can achieve this in U-SQL, read the timestamp from the file in virtual columns in your extract.

    Next step you can order them by this virtual column and process it

    if you need to only process the latest among all this then simply use the ROW_NUMBER() function with PARTITION for unique rows and order by the virtual timestamp column and take where the row number is 1, this will always give you the latest record among the multiple rows.

    Friday, June 22, 2018 5:14 AM