locked
Need help to speed up a qurey. RRS feed

  • Question

  • I have two tables (table1 and table2) which both have similar data.  Both table1 and table2 have over 100000 records each in them. I need to concatenate three columns that are in table1 and compare them to three similar concatenated columns that are in table2.  The result that I am looking for is a list of items that are in table1 but not in table2.  I have used the following query which works but is very slow, sometimes it takes over 60 seconds to run:


    SELECT DISTINCT
      Lower(Division + ' ' + Branch + ' ' + Program) AS Combined
    FROM
      table1
    WHERE
        Lower(Division + ' ' + Branch + ' ' + Program) NOT IN
            (SELECT Lower(Division + ' ' + Branch + ' ' + Program) FROM table1)

    I am wondering if there is a better/faster way to get the list that I need, possibly by using a join?

    Thanks,
    • Moved by Papy Normand Thursday, May 30, 2013 8:53 PM query creation ( coming from the SQL Server Data Access forum )
    Thursday, May 30, 2013 7:12 PM

Answers

  • Try using EXISTS instead, e.g.

    SELECT DISTINCT Lower(Division + ' ' + Branch + ' ' + Program) AS Combined FROM table1 T1 WHERE NOT EXISTS (SELECT 1 FROM table2 T2 where T2.Division = T1.Division

    AND T2.Branch=T1.Branch and T2.Program = T1.Program)



    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog

    • Marked as answer by smokewagon Thursday, May 30, 2013 9:24 PM
    Thursday, May 30, 2013 9:14 PM

All replies

  • Hello smokevagan ,

    I suppose that your database is belonging to a SQL Server.

    Please, could you provide more pieces of information about your problem ?

    - the full version of your SQL Server ( 2005, 2008 , 2008 R2 , 2012 ) . Each new version is bringing novelties and it would be useless to provide you a solution not supported by your version

    - the definition of your tables ( the CREATE TABLE including the definition of each column )

    As your thread is related purely to optimization of a query, I prefer to move your thread towards the Transact-SQL forum which is the most appropriate forum for this kind of problem. I hope you will find a full and quick answer in this new forum.

    Have a nice day


    Mark Post as helpful if it provides any help.Otherwise,leave it as it is.

    Thursday, May 30, 2013 8:52 PM
  • Try using EXISTS instead, e.g.

    SELECT DISTINCT Lower(Division + ' ' + Branch + ' ' + Program) AS Combined FROM table1 T1 WHERE NOT EXISTS (SELECT 1 FROM table2 T2 where T2.Division = T1.Division

    AND T2.Branch=T1.Branch and T2.Program = T1.Program)



    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog

    • Marked as answer by smokewagon Thursday, May 30, 2013 9:24 PM
    Thursday, May 30, 2013 9:14 PM
  • Thanks Naomi - That worked.  If you don't mind, please explain why it is so much faster.  It only took 1 second, my query took 1:31 this time.

    Thursday, May 30, 2013 9:26 PM
  • Your first query was not sargable and even if you had indexes on these columns, it could not have used them. The second query can use indexes on these columns. Also, I prefer using not exists vs. not in every time - see this blog post

    Why you should never use IN/NOT IN in SQL

    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog

    • Proposed as answer by Papy Normand Friday, May 31, 2013 6:41 AM
    Thursday, May 30, 2013 9:30 PM
  • Thnks again, I will use this info tht you povided.  It looks like I need to make chnges to several queres.

    Thursday, May 30, 2013 9:41 PM
  • You can add a new computed column (Division + ' ' + Branch + ' ' + Program) and make it persisted. 

    Please refer the below link

    http://msdn.microsoft.com/en-us/library/ms191250(v=sql.105).aspx

    Thursday, May 30, 2013 10:03 PM
  • Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, data types, etc. in your schema are. Learn how to follow ISO-11179 data element naming conventions and formatting rules (you were not even close). Nothing you had done is standard or good design. Temporal data should use ISO-8601 formats. Code should be in Standard SQL as much as possible and not local dialect. 

    >> I have two tables (table1 and table2) which both have similar data. <<

    This is a bad sign. It could a sign of attribute splitting, a common design flaw of Noobs. 

    >> Both table1 and table2 have over 100,000 records [sic] each in them. <<

    Rows are not anything like records. The code smell is getting worse. 

    >> I need to concatenate three columns that are in table1 and compare them to three similar concatenated columns that are in table2.  <<

    NO! NO! NO! A column models a single attribute that uses a scalar data type. You are writing 1960's COBOL and probably do not even know it. The DDL should have formatted the data to the proper casing, removed extra spaces, validated it again another table, etc. You could not do that in COBOL, but this is SQL

    >> The result that I am looking for is a list [sic: a list is a CSV structure or a linked chain pointer chain] of items that are in table1 but not in table2.  <<

    Here is how to get a table 

    SELECT X.*
      FROM (SELECT division_name, branch_name, program_name
              FROM Foo 
            EXCEPT 
            SELECT division_name, branch_name, program_name
            FROM Bar) AS X;

    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

    Friday, May 31, 2013 12:23 AM