none
U-SQL Delete duplicated records

    Question

  • Is there a way to do the following SQL trick in U-SQL?

    WITH @TransactionsCTE AS
    (
    SELECT *, ROW_NUMBER() OVER(PARTITION BY Ticket) AS RowNumber
    FROM @transactions
    )
    DELETE FROM @TransactionsCTE WHERE RowNumber > 1

    Any other suggestion to remove duplicate data on a particular columns (ticket) ?

    Thursday, December 29, 2016 11:45 PM

Answers

  • @transactions =
        SELECT *,
               ROW_NUMBER() OVER(PARTITION BY Ticket ORDER BY other_columns...) AS RowNumber
        FROM @transactions;
    @transactions =
        SELECT *
        FROM @transactions
        WHERE RowNumber == 1;
    
    

    I recommend to program defensively and use ORDER BY in the PARTITION clause. If non-identical 'duplicates' ever appear in the future, this will protect you somewhat from non-reproducible outcomes, which are often very painful to nail down.

    • Marked as answer by Uri Kluk Monday, January 2, 2017 3:09 PM
    Friday, December 30, 2016 8:24 AM

All replies

  • @transactions =
        SELECT *,
               ROW_NUMBER() OVER(PARTITION BY Ticket ORDER BY other_columns...) AS RowNumber
        FROM @transactions;
    @transactions =
        SELECT *
        FROM @transactions
        WHERE RowNumber == 1;
    
    

    I recommend to program defensively and use ORDER BY in the PARTITION clause. If non-identical 'duplicates' ever appear in the future, this will protect you somewhat from non-reproducible outcomes, which are often very painful to nail down.

    • Marked as answer by Uri Kluk Monday, January 2, 2017 3:09 PM
    Friday, December 30, 2016 8:24 AM
  • Thank you Alexandre! this is what I needed and kudos for the advice on defensive programming!

    • Edited by Uri Kluk Monday, January 2, 2017 3:10 PM
    Monday, January 2, 2017 3:09 PM