none
USQL DELETE Statement

    Question

  • Hi Guys,

    Here is my scenario.

    I'm extracting data from text files, then inserting it into Data Lake Analytics (DLA) Table.

    Before doing that, I need to inner join the incoming data with the existing data in DLA Table then delete the data in DLA table before inserting the incoming data.

    I'm looking for a solution or a way to Delete data from DLA Table, What is the Delete Syntax?

    Here is my current script

    //DECLARE @DataFilesPath string = "Property Management System/Opera/Hyatt/{FileName:*}.TXT";
    DECLARE @DataFilesPath string = "/Opera/Hyatt/{FileName:*}.TXT";
    
    @OperaHyattExtract =
        EXTRACT 
                Pms string,
                InnCodeID string,
                CategoryID int?,
                AccountID string,
                Date DateTime,
                AccountName string,
                Amount float,
                FileName string
        FROM @DataFilesPath
        USING Extractors.Text(delimiter : '|');
    
    @TableOperaHyattDailyOP = 
    SELECT T.*     
        FROM Staging.OperaHyattDailyOP AS T 
             INNER JOIN 
                 @OperaHyattExtract AS O 
                 ON T.FileName == O.FileName;
    
    //DELETE 
    
    OUTPUT @TableOperaHyattDailyOP
        TO "/Output/Opera/Hyatt/OPInnerJoinOutput.csv"
        USING Outputters.Csv();
    


    Codernater

    Tuesday, November 8, 2016 6:46 PM

All replies

  • U-SQL does not support fine-grained operations like DELETE or UPDATE. Fortunately you can TRUNCATE your table, so a workaround would be to load the data that you don't want to delete in a variable, truncate the table and then insert the data that you want to keep together with your new data.

    Jorg Klein's Microsoft Business Intelligence Blog

    Wednesday, November 9, 2016 10:26 AM
  • Thank you for your response.

    in my current process, I believe this solution will not work because in DLA Analytics we have Staging.Table1

    and PMS.Table1 (this table is final destination for the data)

    PMS.Table1 will hold all the records of the data and we are talking close to a million records per day, therefore, I don't see the process to be scalable in the future.

    I can explain more in details my bigdata project, would it be okay if you can send me your email so I can explain visually my process?

    Here is my linkedin profile for us to connect. linkedin profile

    Thank you for your time.

    John


    Codernater


    • Edited by Codernater Wednesday, November 9, 2016 10:33 PM forgot to add the link for my linkedin
    Wednesday, November 9, 2016 7:20 PM
  • John,

    Is it an option to partition your table in a way that deleting particular partitions would do the job?

    See: http://www.sqlservercentral.com/articles/U-SQL/143722/

    I prefer the discussion here so other people can also benefit from it. And who knows someone else here has a great idea or suggestion ;-)


    Jorg Klein's Microsoft Business Intelligence Blog

    Thursday, November 10, 2016 1:52 PM
  • I think partition will work, but I don't think I can use it because my partition has to be by file and per date. The blog is showing a static variable on the partition. I would like the partition to be dynamic based on at least file name.

    Also I don't know if the partition has impact on performance on the long run.

    On another hand, it was brought to my attention by a friend to try the following:

    1. process files per date, meaning DLS storage would be like this: YYYY\MM\DD\File01012016.txt
    2. Output csv file per date
    3. Now if I need to process a new file that I already processed before the output file will overwrite the existing csv file.

    This is a solution for the csv files where they are treated as Data Source however the data in the DLA will be out of synch, since I can't delete I will have 2 versions of the same data (file1 and file2) and I care only about the last file that got processed.

    So I'm kinda still stuck !!!


    Codernater

    Saturday, November 12, 2016 1:04 AM
  • You can create new partitions using ALTER TABLE ADD PARTITION after the inital table/partition creation. Here is an example: https://github.com/Azure/usql/blob/master/Examples/AmbulanceDemos/AmbulanceDemos/5-Ambulance-StreamSets-PartitionedTables/5.4-Ambulance-CreatePartitionedTable.usql


    Jorg Klein's Microsoft Business Intelligence Blog

    Monday, November 14, 2016 1:44 PM