none
What do you do when an index isn’t the answer?

    Question

  • I have a simple query. Column names are irrelevant the * is just for example.

    SELECT *
    FROM Contacts C

    JOIN StageTable ST

    ON C.Email = ST.Email

    WHERE C.CatID = ST.CatID

    This has ran fine in the past but now suddenly runs crazy slow.

                                          

    StageTable has 3 records. Contacts has 19 million. Email is of course varchar but CatID is an int. I discovered that Contacts didn’t have an index on the email column. I added one but no dice. It didn’t run any faster. Now what?! This query is so simple I’m at a loss on how to speed it up.

    Monday, April 29, 2013 2:26 PM

Answers

  • While designing index, analysing  column used in select part is impotent. since you haven't mentioned any thing and only 3 row in StageTable table, I would assuming you selecting records from Contacts.. I would rewrite the code to

    SELECT *
    FROM StageTable ST
    WHERE EXISTS (SELECT 1 FROM  Contacts C
             where C.CatID = ST.CatID AND C.Email = ST.Email)

    Create an index on  StageTable  CatID column  and add  Email column in include . If  CatID is the PK the create an index on email column

    vt

     

    Please mark answered if I've answered your question and vote for it as helpful to help other user's find a solution quicker

    • Marked as answer by falcon00 Monday, April 29, 2013 4:14 PM
    • Unmarked as answer by falcon00 Monday, April 29, 2013 4:35 PM
    • Marked as answer by falcon00 Wednesday, May 01, 2013 2:13 PM
    Monday, April 29, 2013 3:26 PM
  • The ultimate solution wound up being putting an index on both the Contacts table and the Stage table consisting of email and CatID. We ran it several times yesterday and this seems to be stable. I'll back through and mark those replys that helped me get there.
    • Marked as answer by falcon00 Wednesday, May 01, 2013 2:12 PM
    Wednesday, May 01, 2013 2:11 PM

All replies

  • Could you post the graphical execution plan..

    vt


    Please mark answered if I've answered your question and vote for it as helpful to help other user's find a solution quicker

    Monday, April 29, 2013 2:28 PM
  • Hello,

    Have you check the execution plan, if the new index is used / or it may show up a missing index?

    Have you created the new index only for the column "email"? See, the following equals your query, and by the join predict you should create one index with the both columns CatID + EMail.

    SELECT *
    FROM Contacts C
         INNER JOIN 
         StageTable ST
             ON C.Email = ST.Email AND C.CatID = ST.CatID


    Olaf Helper

    Blog Xing

    Monday, April 29, 2013 2:33 PM
  • Since you don't have a WHERE clause, only a join criteria between the tables, this is going to do a table scan of StageTable.  No amount of indexing will help this query.

    What exactly are you trying to do?

    Monday, April 29, 2013 2:53 PM
  • Monday, April 29, 2013 3:06 PM
  • This is only the estimated. The query won't run so I can't even see the actual.
    Monday, April 29, 2013 3:07 PM
  • The email is the key value. CatID is just a limiter. I was taught that limiters go in the where clause and key values go in the join. I'll experiment with doing an index with more than one column but let me answer Tom first.
    Monday, April 29, 2013 3:08 PM
  • I'm trying to take the three records and match the to the Contacts table based on email address (business requirement) and the CatID because one email could be in more than one category. The reason I'm trying to match on email is because we're missing the actual key value that we really need so this is the second attempt to make a match. It really is that simple and has run quickly in the past. For some reason this just stopped working fast.
    Monday, April 29, 2013 3:11 PM
  • Try this,

    SELECT *
    FROM Contacts C
    where exists ( select 1 from  StageTable ST
    WHERE C.CatID = ST.CatID and 
     C.Email = ST.Email)

    Also, instead of * use specific column names( even if you dont publish here try once to verify) and what is the clustered Index key on Contacts table?


    Thanks
    Sarat

    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Monday, April 29, 2013 3:12 PM
  • While designing index, analysing  column used in select part is impotent. since you haven't mentioned any thing and only 3 row in StageTable table, I would assuming you selecting records from Contacts.. I would rewrite the code to

    SELECT *
    FROM StageTable ST
    WHERE EXISTS (SELECT 1 FROM  Contacts C
             where C.CatID = ST.CatID AND C.Email = ST.Email)

    Create an index on  StageTable  CatID column  and add  Email column in include . If  CatID is the PK the create an index on email column

    vt

     

    Please mark answered if I've answered your question and vote for it as helpful to help other user's find a solution quicker

    • Marked as answer by falcon00 Monday, April 29, 2013 4:14 PM
    • Unmarked as answer by falcon00 Monday, April 29, 2013 4:35 PM
    • Marked as answer by falcon00 Wednesday, May 01, 2013 2:13 PM
    Monday, April 29, 2013 3:26 PM
  • It doesn't like your syntax. I think that technique requires that you use an aggragated function. In any case it doesn't like that I'm joining to the Contacts table inside the sub query. The clustered index is the primary key of the Contacts table.
    Monday, April 29, 2013 3:30 PM
  • VT and Surat.

    Msg 156, Level 15, State 1, Line 4
    Incorrect syntax near the keyword 'select'.
    Msg 102, Level 15, State 1, Line 7
    Incorrect syntax near ')'.

    Monday, April 29, 2013 3:33 PM
  • Check this out. I'm creating an index in the stage table on catID, and email. The index creation process is taking an EXCESSIVE amount of time. What if there is something wrong with my stage table?! How would I even check that?!
    Monday, April 29, 2013 3:40 PM
  • Check this out. I'm creating an index in the stage table on catID, and email. The index creation process is taking an EXCESSIVE amount of time. What if there is something wrong with my stage table?! How would I even check that?!

    Check for locking.. 

    correction to my previous post

    SELECT *
    FROM Contacts C
    where exists ( select 1 from StageTable ST
    WHERE C.CatID = ST.CatID and
    C.Email = ST.Email)

    vt


    Please mark answered if I've answered your question and vote for it as helpful to help other user's find a solution quicker

    • Marked as answer by falcon00 Monday, April 29, 2013 4:13 PM
    • Unmarked as answer by falcon00 Monday, April 29, 2013 4:35 PM
    Monday, April 29, 2013 3:43 PM
  • I agree with vt.  There may well be locks on the stage table.  Maybe the program that built it inserted the rows in transaction state and never did a commit?  You said the stage table has only 3 rows.  If so, without locks, creating an index should be virtually instantaneous.   But its also true that an index wwill make no difference on a 3 row table.

    You can try checking for problems in the stage table with DBCC CHECKTABLE(<table name>)

    Tom

    • Marked as answer by falcon00 Monday, April 29, 2013 4:13 PM
    • Unmarked as answer by falcon00 Monday, April 29, 2013 4:35 PM
    Monday, April 29, 2013 3:57 PM
  • It's all good on the syntax. I misspelled EXISTS and SSMS didn't flag it.

    It turns how there are a BUNCH of locks! There are four of them all doing the same thing. Creating the Stored Proc that has the query that I'm trying to run. It seems like commiting transactions is once again biting me in the butt.

    Monday, April 29, 2013 4:03 PM
  • Ok so I discovered the locks on the stage table. Blew them away. I don't understand why those locks where there. They were creating the stored procedure that the above query is a part of. I thought maybe I might be calling that proc with recompile but no. All of my procs are carefully wrapped in begin/commit transaction statements.

    After I blew away the locks the query still didn't work right. Added the indexes as suggested on the StageTable and now it works properly.

    The immediate problem has been solved but the root cause is still a mystery. Additionally I don't understand why indexes on the tiny table made it work but not on the huge table where placing an index would have made more sense.


    • Marked as answer by falcon00 Monday, April 29, 2013 4:13 PM
    • Edited by falcon00 Monday, April 29, 2013 4:15 PM
    • Unmarked as answer by falcon00 Monday, April 29, 2013 4:35 PM
    Monday, April 29, 2013 4:13 PM
  • Spoke too soon. Running my package again it chokes in the same spot. This time there are no locks. This is crazy.
    Monday, April 29, 2013 4:36 PM
  • Ok so I discovered the locks on the stage table. Blew them away. I don't understand why those locks where there. They were creating the stored procedure that the above query is a part of. I thought maybe I might be calling that proc with recompile but no. All of my procs are carefully wrapped in begin/commit transaction statements.

    After I blew away the locks the query still didn't work right. Added the indexes as suggested on the StageTable and now it works properly.

    The immediate problem has been solved but the root cause is still a mystery. Additionally I don't understand why indexes on the tiny table made it work but not on the huge table where placing an index would have made more sense.


    One thing you might look at is the process that called this procedure.  If the stored proc does a BEGIN TRAN and a COMMIT, that COMMIT only frees the locks if you were not in a transaction when you called the stored proc.  SQL Server allows you to nest BEGIN TRAN's, but it does not really have nested transactions.  Instead, it just keeps a counter of how many unmatched BEGIN TRAN's there have been (the current value is available in @@TRANCOUNT). 

    When you do a BEGIN TRAN and @@TRANCOUNT is equal to 0, that sets @@TRANCOUNT to 1 and puts you in transaction state.  If you do a BEGIN TRAN with @@TRANCOUNT greate than zero, all SQL does is add 1 to @@TRANCOUNT.

    When you do a COMMIT, SQL looks at @@TRANCOUNT.  If it is zero, you get an error.  If it is 1, the transaction is committed and all the locks are freed.  But if is greater than 1, all SQL does is subtract 1 from @@TRANCOUNT.  It doesn't commit anything and it doesn't free any locks.

    When you do a ROLLBACK, SQL looks at @@TRANCOUNT.  If it is zero, you get an error.  If it is any other value, SQL rolls back everything to the BEGIN TRAN that started the transaction (no matter how deep in multiple BEGIN TRAN's you are) and frees all the locks.

    So if your calling code does a BEGIN TRAN, then calls your procedure.  And your procedure does a BEGIN TRAN, makes some changes to the database and does a COMMIT and returns to your calling code, at that point in time all the rows locked by the stored procedure are still locked.  And they will stay locked until the calling code does a COMMIT or ROLLBACK.

    Tom


    • Edited by Tom Cooper Monday, April 29, 2013 4:50 PM
    • Proposed as answer by Naomi NModerator Monday, April 29, 2013 4:50 PM
    • Unproposed as answer by falcon00 Wednesday, May 01, 2013 2:12 PM
    Monday, April 29, 2013 4:49 PM
  • It's sitting in an SSIS package and I'm starting to wonder if I need to move the discussion. When I run the package now NO locks are created but it is still taking an excessive amount of time to run (so long that it's never completed even once). It's sitting in an execute SQL task. All transactions should be closed before we get to that step. Also I've verified that it works fine with other data. There is something about this particular dataset were I'm being forced to match on emails. The QC guy let it run one time and it took literally HOURS to process three records.

    Is there something about SSIS such that it would begin a transaction BEFORE executing a stored proc?!

    Monday, April 29, 2013 6:01 PM
  • Aaaand I take it back. Locks are being created just randomly. I'm not seeing any kind of established pattern. For some reason SSIS is running a create procedure script for no apparent reason.
    Monday, April 29, 2013 6:02 PM
  • More discoveries in my research of the problem. The original query from up above runs super slow BUT if I just say

    SELECT *
    FROM Contacts C
    WHERE C.Email = 'known email'
    AND C.CatID = Known CatID

    This comes back in a decent amount of time.

    Both queries were performed with and with out existing locks on the StageTable.

    Monday, April 29, 2013 6:58 PM
  • Also, you should check whether column email has the same declaration in both tables. For example, if the one in Contacts is varchar while the one in StageTable is nvarchar, or if they have different collations, then the optimizer might not choose its index.

    And, if the Contracts email column is in fact a key, then make sure it is indexed with a unique index. That also makes it more likely that it will be used.


    Gert-Jan

    Tuesday, April 30, 2013 8:42 AM
  • The ultimate solution wound up being putting an index on both the Contacts table and the Stage table consisting of email and CatID. We ran it several times yesterday and this seems to be stable. I'll back through and mark those replys that helped me get there.
    • Marked as answer by falcon00 Wednesday, May 01, 2013 2:12 PM
    Wednesday, May 01, 2013 2:11 PM