none
u-SQL Declaring your input based on year and month of file

    Question

  • I have lots of files in my data lake. For example Sales_20181220.csv Sales_20181221.csv

    I want my SQL Script to only get files for the current month and as the source (201812)

    I then want to name my output as the current month

    Currently I have this test which works. It just takes all the csvs and adds data to an output file (Which currently isnt dynamic at all

    DECLARE @Sales_file  string = "/Salesdemo/{*}.csv";
    DECLARE @out string = "/MonthlySales/AggSales_201812.csv";
    
    @sales =
    EXTRACT DateID int,
            OrderDate DateTime,
            ModifiedDate DateTime,
            SalesOrderNumber string,
            SalesAmount decimal,
            ProductName string
    FROM @Sales_file
    USING Extractors.Csv();
    
    @Totals =
    SELECT MIN(OrderDate) AS MinOrderDate, 
    MAX(OrderDate)AS MaxOrderDate, 
    COUNT(*) AS TotalOrders, 
    SUM(SalesAmount)AS TotalSalesAmount
    FROM @sales;
    
    OUTPUT @Totals
        TO @out
        USING Outputters.Csv();
    
    

    Any idea how I make the input and output dynamic?


    Debbie

    Friday, January 4, 2019 11:55 AM

Answers

  • Hi Debbie,

    For filesets with dates, I would recommend you to go through this doc :

    https://github.com/saveenr/usql-tutorial/blob/master/filesets/filesets-with-dates.md


    MSDN

    Tuesday, January 8, 2019 12:05 PM
    Moderator
  • Thanks for that

    Ill have a go at that. I got my U-SQL Script at least working

    //First of all we want the current date
    DECLARE @now DateTime = DateTime.Now;
    //The data is taken from yesterday so we needs to use yesterdays date
    DECLARE @yesterday  = @now.AddDays(-1);
    //Now we are taking date information from Yesterdays variable
    DECLARE @yesterdayString string = @yesterday.ToString("yyyyMM");
    DECLARE @year string = @yesterday.ToString("yyyy");
    DECLARE @month string = @yesterday.ToString("MM");
    DECLARE @day string = @yesterday.ToString("dd");
    
    //Now we want to declare the source folder path and the prefix of the source file. We will need to add the day folder as a wildcard
    //Now we have switched to __filename  @sourceFilePrefix isnt being used anymore
    DECLARE @sourceFolderPath string = "/Salesdemo/";
    DECLARE @sourceFilePrefix string = "Sales_";
    
    DECLARE @fileExtension string = ".csv";
    DECLARE @destinationFilePrefix string = "/MonthlySales/AggSales_";
    
    //These Are commented out because I have used them for testing purposes. 
    //DECLARE @salesFile  string = @sourceFolderPath + @sourceFilePrefix + @nowString + "{*}" + @fileExtension;
    //DECLARE @salesFile  string = @sourceFolderPath + @year + "/" + @month + "/" + "05"  + "/"  + @sourceFilePrefix + @nowString + "{*}" + @fileExtension;
    
    //Note  __filename is included so we can add __filename into the Extract. the day folder has been wild carded
    DECLARE @salesFile  string = @sourceFolderPath + @year + "/" + @month + "/" + "{*}"  + "/"  + "{__filename}";
    DECLARE @outputFile string = @destinationFilePrefix + @yesterdayString +  @fileExtension;
    
    @sales =
    EXTRACT DateID int,
            OrderDate DateTime,
            ModifiedDate DateTime,
            SalesOrderNumber string,
            SalesAmount decimal,
            ProductName string,
            __filename string
    FROM @salesFile
    USING Extractors.Csv();
    
    @totals =
    SELECT __filename, MIN(OrderDate) AS MinOrderDate, 
    MAX(OrderDate)AS MaxOrderDate, 
    COUNT(*) AS TotalOrders, 
    SUM(SalesAmount)AS TotalSalesAmount
    FROM @sales
    GROUP BY __filename ;
    
    OUTPUT @totals
        TO @outputFile
        USING Outputters.Csv();
    
    


    Debbie

    Tuesday, January 8, 2019 4:49 PM

All replies

  • Hi Debbie,

    For filesets with dates, I would recommend you to go through this doc :

    https://github.com/saveenr/usql-tutorial/blob/master/filesets/filesets-with-dates.md


    MSDN

    Tuesday, January 8, 2019 12:05 PM
    Moderator
  • Thanks for that

    Ill have a go at that. I got my U-SQL Script at least working

    //First of all we want the current date
    DECLARE @now DateTime = DateTime.Now;
    //The data is taken from yesterday so we needs to use yesterdays date
    DECLARE @yesterday  = @now.AddDays(-1);
    //Now we are taking date information from Yesterdays variable
    DECLARE @yesterdayString string = @yesterday.ToString("yyyyMM");
    DECLARE @year string = @yesterday.ToString("yyyy");
    DECLARE @month string = @yesterday.ToString("MM");
    DECLARE @day string = @yesterday.ToString("dd");
    
    //Now we want to declare the source folder path and the prefix of the source file. We will need to add the day folder as a wildcard
    //Now we have switched to __filename  @sourceFilePrefix isnt being used anymore
    DECLARE @sourceFolderPath string = "/Salesdemo/";
    DECLARE @sourceFilePrefix string = "Sales_";
    
    DECLARE @fileExtension string = ".csv";
    DECLARE @destinationFilePrefix string = "/MonthlySales/AggSales_";
    
    //These Are commented out because I have used them for testing purposes. 
    //DECLARE @salesFile  string = @sourceFolderPath + @sourceFilePrefix + @nowString + "{*}" + @fileExtension;
    //DECLARE @salesFile  string = @sourceFolderPath + @year + "/" + @month + "/" + "05"  + "/"  + @sourceFilePrefix + @nowString + "{*}" + @fileExtension;
    
    //Note  __filename is included so we can add __filename into the Extract. the day folder has been wild carded
    DECLARE @salesFile  string = @sourceFolderPath + @year + "/" + @month + "/" + "{*}"  + "/"  + "{__filename}";
    DECLARE @outputFile string = @destinationFilePrefix + @yesterdayString +  @fileExtension;
    
    @sales =
    EXTRACT DateID int,
            OrderDate DateTime,
            ModifiedDate DateTime,
            SalesOrderNumber string,
            SalesAmount decimal,
            ProductName string,
            __filename string
    FROM @salesFile
    USING Extractors.Csv();
    
    @totals =
    SELECT __filename, MIN(OrderDate) AS MinOrderDate, 
    MAX(OrderDate)AS MaxOrderDate, 
    COUNT(*) AS TotalOrders, 
    SUM(SalesAmount)AS TotalSalesAmount
    FROM @sales
    GROUP BY __filename ;
    
    OUTPUT @totals
        TO @outputFile
        USING Outputters.Csv();
    
    


    Debbie

    Tuesday, January 8, 2019 4:49 PM
  • Great ! Thanks for sharing your findings as well.

    MSDN

    Wednesday, January 9, 2019 6:50 AM
    Moderator