locked
Deleting many rows from a big table MySql 5.5.46 RRS feed

  • Question

  • User-1453200658 posted

    Hi,

    The following statement deletes duplicate rows and keeps the highest id

    DELETE t1 FROM contacts t1
    INNER JOIN contacts t2 
    WHERE 
        t1.id < t2.id AND 
        t1.email = t2.email;
    

    This query references the contacts table twice, therefore, it uses the table alias t1 and t2.

    But I have a 6 millions rows table and we need to clean it.

    My first approach was create a SP with this lines

    REPEAT
    DELETE t1 FROM contacts t1
    INNER JOIN contacts t2 
    WHERE 
        t1.id < t2.id AND 
        t1.email = t2.email
    ORDER BY t1.id ASC LIMIT 10000;
    UNTIL ROW_COUNT() = 0 END REPEAT;
    

    The error is

    1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'ORDER BY t1.id ASC LIMIT 10000' at line 17 Time: 0,063s

    Help me to do it.

    Sunday, February 28, 2021 4:04 PM

Answers

  • User503812343 posted

    LIMIT does not work with join ... you may copy your entire data to AWS  Athena remove data that is not required and then copy back to MySQL 

    Or Create a temp table with row_id to each row compare delete duplicate entries

    1. Create Temporary Table that holds just ids.

    CREATE TABLE id_temp_table ( temp_id int);

    1. Insert ids that should be removed:

    insert into id_temp_table (temp_id) select.....

    1. Create New table table_new

    2. Insert all records from table to table_new without unnecessary rows that are in id_temp_table

    insert into table_new .... where table_id NOT IN (select distinct(temp_id) from id_temp_table);

    1. Rename tables

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Wednesday, March 3, 2021 2:22 PM