Answered by:
Very slow SQL query

Question
-
I am using SQL Server 2008 R2 and SQL Server Management Studio 10.50.1617.0
I have a database with numerous tables. I am querying a table with > 60,000 rows and 29 columns of types int, tinyint, nvarchar and a datetime.
I am writing a query to list possible duplicates in the table but, depending on the columns chosen, it was sometimes taking a very long time to run. I simplified the query to find the minimum query that will have the problem. The following query:
select *
from salesLead a
where (select count(*) from salesLead b
where
a.email = b.email and
a.address = b.address and
isnull(a.nation, 'Australia') = isnull(b.nation, 'Australia')) > 1takes almost 10 minutes to run. Each of the fields are nullable nvarchar fields of different lengths.
If I remove any of the 3 lines in the nested where clause then it will take 1-2 seconds. If I remove the isnull clause from b.nation then it will take 1-2 seconds.
Can anyone explain why this combination takes so long to process?
Tuesday, September 13, 2011 4:25 AM
Answers
-
Hi
The basic problem with your query is that it is iterating over the entire data set for each row in the data set (correlated subquery). There are a few better was of doing this in one set operation. Try this:
select * from ( select *, ROW_NUMBER() over(partition by email, address, isnull(nation, 'Australia') order by email, address, isnull(nation, 'Australia')) as RowNum from SalesLead ) a where RowNum > 1
this makes use of the newer row ranking functions built into TSQL
Craig Bryden - Please mark correct answers- Edited by Craig BrydenMVP Tuesday, September 13, 2011 4:39 AM
- Proposed as answer by DVR Prasad Tuesday, September 13, 2011 4:42 AM
- Marked as answer by Naomi N Sunday, December 23, 2012 8:28 PM
Tuesday, September 13, 2011 4:38 AM
All replies
-
Hi
The basic problem with your query is that it is iterating over the entire data set for each row in the data set (correlated subquery). There are a few better was of doing this in one set operation. Try this:
select * from ( select *, ROW_NUMBER() over(partition by email, address, isnull(nation, 'Australia') order by email, address, isnull(nation, 'Australia')) as RowNum from SalesLead ) a where RowNum > 1
this makes use of the newer row ranking functions built into TSQL
Craig Bryden - Please mark correct answers- Edited by Craig BrydenMVP Tuesday, September 13, 2011 4:39 AM
- Proposed as answer by DVR Prasad Tuesday, September 13, 2011 4:42 AM
- Marked as answer by Naomi N Sunday, December 23, 2012 8:28 PM
Tuesday, September 13, 2011 4:38 AM -
Hi IanExpanz,
please add this statement just before your query:
set statistics io on
This will produce a summary of how many read operations was needed to execute the query in the messages tab of SSMS.
Also, please have a look at the execution plan (mark your query and press Ctrl-L in SSMS).
May be you can modify your query by using the EXISTS syntax.
regards
"It's time to kick ass and chew bubble gum... and I'm all outta gum." - Duke NukemTuesday, September 13, 2011 4:42 AM -
Hi, try please :
;with CTE as (select *,ROW_NUMBER() over (partition by email,address,nation order by email) as rn From salesLead) select * from CTE C1 where exists(select 1 from CTE C2 Where C1.email=C2.email and C1.adress=C2.adress and isnull(C1.nation,'Australia')=isnull(nation,'Australia') and C2.rn=2)
Best regards
- Edited by Badii Gharbi Tuesday, September 13, 2011 5:01 AM
Tuesday, September 13, 2011 4:43 AM -
Hi
Thanks for the post. Let us also know about the index details of the table salesLead. Index does play a major role when there is a performance issue. I suggest you create a combined index for (email, address, nation), the query would go for index scan rather than clustered index scan.
Pls mark as answer, if this helps.
- KerobinTuesday, September 13, 2011 4:55 AM -
;WITH CTE AS ( SELECT *, COUNT(1) OVER (PARTITION BY email, [address], ISNULL(nation,'Australia')) cnt FROM salesLead ) SELECT * FROM CTE WHERE cnt>1
Tuesday, September 13, 2011 4:59 AM -
Also unicode comparison are resource intensive operation which may result in table scan also.
Thanks and regards, Rishabh , Microsoft Community ContributorTuesday, September 13, 2011 5:12 AM -
>isnull(a.nation, 'Australia') = isnull(b.nation, 'Australia'))
The above predicate is not SARGable.
SARGable article: http://www.sqlusa.com/bestpractices/sargable/
Kalman Toth, SQL Server & Business Intelligence Training; sqlusa.comTuesday, September 13, 2011 5:28 AM -
Hi,
Without more details like the execution plan for the query, it is hard to say exactly what the reason for the performance hit.
Kalman's answer is the closest to being right. I.e., because it is a correlated subquery, a nested join may be used. And since the predicate is not SARGable, a table scan is used.
Here's a query (to find duplicates) that won't use a nested join and so a SARGable argument is not necessary.
select email, address, isnull(a.nation, 'Australia'), count(*) from salesLead group by email, address, isnull(a.nation, 'Australia') having count(*) > 1
if possible, create a index on the email, address and nation column to speed it up.
HTH,
An
- Edited by anphu Wednesday, December 12, 2012 10:18 PM mis-spellings
Wednesday, December 12, 2012 10:16 PM -
Please look at a SSIS/SSAS. These tools might be faster and more accurate.
Monday, December 17, 2012 11:35 PM -
Hi IanExpanz,
Plz try this...
select *
from salesLead a
where (select MIN(@@count) from salesLead b
where
a.email = b.email and
a.address = b.address and
isnull(a.nation, 'Australia') = isnull(b.nation, 'Australia')) > 1
- Edited by ameyjoe Tuesday, December 18, 2012 9:19 AM
Tuesday, December 18, 2012 7:36 AM