none
Serious Performance Problems with INTERSECT using SQL 2005

    Question

  • I have a set of queries that I need to do an intersect on (could be up to 6 queries).  Any time that are more than 2 intersects the performance degrade significantly.    Here is the example

    SELECT M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    FROM MonitorTable M1 WITH(NOLOCK)
    WHERE (M1.ClientID = 51 AND M1.Keyword NOT LIKE '%unmatched%') AND  M1.Date >= '5/15/2009'
    GROUP BY M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    HAVING SUM(M1.Conversions) = 0

    INTERSECT

    SELECT M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    FROM MonitorTable M1 WITH(NOLOCK)
    WHERE (M1.ClientID = 51 AND M1.Keyword NOT LIKE '%unmatched%') AND  M1.Date >= '6/14/2009'
    GROUP BY M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    HAVING SUM(M1.Cost) > 300

    INTERSECT

    SELECT M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    FROM MonitorTable M1 WITH(NOLOCK)
    WHERE (M1.ClientID = 51 AND M1.Keyword NOT LIKE '%unmatched%') AND  M1.Date >= '8/12/2009'
    GROUP BY M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    HAVING SUM(M1.Impressions) >= 3


    Each query on its own runs in about 2 seconds.  Any two of the three run together with an INTERSECT between then runs in about 2 seconds.   Add the 3rd query with the INTERSECT and it runs for 6 minutes.

    If I modify the query to be the following, where I put the results from the first INTERSECT into a temp table then intersect that result with the 3rd query the whole thing runs in about 2 seconds.  The problem is that since I need to build this dynamically and also need to support anywhere from 2 to 6 INTERSECT's the TEMP table solution is much more difficult to code.

     Any suggestions would be appreciated.

    Jeff Cleath
    Inetium, LLC

    Friday, August 14, 2009 6:48 AM

Answers

  • Are you getting parallelism?
    I think the query engine works on all queries simultaneously, and INTERSECTs on the fly, not when all queries are done and stored in a worktable.
    Try my suggestion and pleas post back the results.

    insert #temp1
    SELECT M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    FROM MonitorTable M1 WITH(NOLOCK)
    WHERE (M1.ClientID = 51 AND M1.Keyword NOT LIKE '%unmatched%') AND  M1.Date >= '5/15/2009'
    GROUP BY M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    HAVING SUM(M1.Conversions) = 0 
    
    insert #temp2
    SELECT M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    FROM MonitorTable M1 WITH(NOLOCK)
    WHERE (M1.ClientID = 51 AND M1.Keyword NOT LIKE '%unmatched%') AND  M1.Date >= '6/14/2009'
    GROUP BY M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    HAVING SUM(M1.Cost) > 300 
    
    insert #temp3
    SELECT M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    FROM MonitorTable M1 WITH(NOLOCK)
    WHERE (M1.ClientID = 51 AND M1.Keyword NOT LIKE '%unmatched%') AND  M1.Date >= '8/12/2009'
    GROUP BY M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    HAVING SUM(M1.Impressions) >= 3 
    
    select * from #temp1
    intersect
    select * from #temp2
    intersect
    select * from #temp3
    
    • Marked as answer by jcleath Friday, August 14, 2009 12:35 PM
    Friday, August 14, 2009 7:46 AM
  • Also, have you tried this rewrite? The table is only read once...


    SELECT		DataSourceName,
    		Campaign,
    		AdGroup,
    		Keyword,
    		MatchType,
    		KeywordID,
    		AdGroupID,
    		DataSourceID,
    		MaximumCPC,
    		KeywordType
    FROM		MonitorTable WITH (NOLOCK)
    WHERE		ClientID = 51
    		AND Keyword NOT LIKE '%unmatched%'
    		AND [Date] >= '5/15/2009'			-- Some further optimization
    GROUP BY	DataSourceName,
    		Campaign,
    		AdGroup,
    		Keyword,
    		MatchType,
    		KeywordID,
    		AdGroupID,
    		DataSourceID,
    		MaximumCPC,
    		KeywordType
    HAVING		SUM(CASE WHEN [Date] >= '5/15/2009' THEN Conversions ELSE 0 END) = 0 
    		AND SUM(CASE WHEN [Date] >= '6/14/2009' THEN Cost ELSE 0 END) > 300
    		AND SUM(CASE WHEN [Date] >= '8/12/2009' THEN Impressions ELSE 0 END) >= 3
    • Edited by SwePesoMVP Friday, August 14, 2009 7:54 AM
    • Marked as answer by jcleath Friday, August 14, 2009 1:07 PM
    Friday, August 14, 2009 7:52 AM

All replies

  • I would guess it's not INTERSECT that is the problem.
    It's grouping by 10 columns, three times.

    Try inserting all the queries in a separate temp table, and then at the end, INTERSECT them.
    Friday, August 14, 2009 7:01 AM
  • I appreciate the response.  But as I mentioned in the post, if I do either 2 of the queries together with the INTERSECT or two queries together with and INTERSECT to a temp table and then INTERSECT that temp table with the third query then it completed the work in 2 seconds.     Also, each query in itself returns in about 1 second.

    This tells me that the INTERSECT is the issue if it takes over 6 minutes.
    Jeff Cleath Inetium, LLC jcleath@inetium.com
    Friday, August 14, 2009 7:05 AM
  • Are you getting parallelism?
    I think the query engine works on all queries simultaneously, and INTERSECTs on the fly, not when all queries are done and stored in a worktable.
    Try my suggestion and pleas post back the results.

    insert #temp1
    SELECT M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    FROM MonitorTable M1 WITH(NOLOCK)
    WHERE (M1.ClientID = 51 AND M1.Keyword NOT LIKE '%unmatched%') AND  M1.Date >= '5/15/2009'
    GROUP BY M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    HAVING SUM(M1.Conversions) = 0 
    
    insert #temp2
    SELECT M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    FROM MonitorTable M1 WITH(NOLOCK)
    WHERE (M1.ClientID = 51 AND M1.Keyword NOT LIKE '%unmatched%') AND  M1.Date >= '6/14/2009'
    GROUP BY M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    HAVING SUM(M1.Cost) > 300 
    
    insert #temp3
    SELECT M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    FROM MonitorTable M1 WITH(NOLOCK)
    WHERE (M1.ClientID = 51 AND M1.Keyword NOT LIKE '%unmatched%') AND  M1.Date >= '8/12/2009'
    GROUP BY M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    HAVING SUM(M1.Impressions) >= 3 
    
    select * from #temp1
    intersect
    select * from #temp2
    intersect
    select * from #temp3
    
    • Marked as answer by jcleath Friday, August 14, 2009 12:35 PM
    Friday, August 14, 2009 7:46 AM
  • Also, have you tried this rewrite? The table is only read once...


    SELECT		DataSourceName,
    		Campaign,
    		AdGroup,
    		Keyword,
    		MatchType,
    		KeywordID,
    		AdGroupID,
    		DataSourceID,
    		MaximumCPC,
    		KeywordType
    FROM		MonitorTable WITH (NOLOCK)
    WHERE		ClientID = 51
    		AND Keyword NOT LIKE '%unmatched%'
    		AND [Date] >= '5/15/2009'			-- Some further optimization
    GROUP BY	DataSourceName,
    		Campaign,
    		AdGroup,
    		Keyword,
    		MatchType,
    		KeywordID,
    		AdGroupID,
    		DataSourceID,
    		MaximumCPC,
    		KeywordType
    HAVING		SUM(CASE WHEN [Date] >= '5/15/2009' THEN Conversions ELSE 0 END) = 0 
    		AND SUM(CASE WHEN [Date] >= '6/14/2009' THEN Cost ELSE 0 END) > 300
    		AND SUM(CASE WHEN [Date] >= '8/12/2009' THEN Impressions ELSE 0 END) >= 3
    • Edited by SwePesoMVP Friday, August 14, 2009 7:54 AM
    • Marked as answer by jcleath Friday, August 14, 2009 1:07 PM
    Friday, August 14, 2009 7:52 AM
  • Thanks Peso, that was a good suggestion.  I tried your original exact format and received errors around
    Msg 208, Level 16, State 0, Line 1
    Invalid object name '#temp3'.

    But I changed it to

    SELECT * into #temp1 FROM
    (SELECT M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    FROM MonitorTable M1 WITH(NOLOCK)
    WHERE (M1.ClientID = 51 AND M1.Keyword NOT LIKE '%unmatched%') AND  M1.Date >= '5/15/2009'
    GROUP BY M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    HAVING SUM(M1.Conversions) = 0 ) A

    SELECT * into #temp2 FROM
    (SELECT M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    FROM MonitorTable M1 WITH(NOLOCK)
    WHERE (M1.ClientID = 51 AND M1.Keyword NOT LIKE '%unmatched%') AND  M1.Date >= '6/14/2009'
    GROUP BY M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    HAVING SUM(M1.Cost) > 300 ) B

    SELECT * into #temp3 FROM
    (SELECT M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    FROM MonitorTable M1 WITH(NOLOCK)
    WHERE (M1.ClientID = 51 AND M1.Keyword NOT LIKE '%unmatched%') AND  M1.Date >= '8/12/2009'
    GROUP BY M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    HAVING SUM(M1.Impressions) >= 3 ) C

    select * from #temp1
    intersect
    select * from #temp2
    intersect
    select * from #temp3

    And it worked in 2 seconds.

    Thanks

    Jeff Cleath


    Jeff Cleath Inetium, LLC jcleath@inetium.com
    Friday, August 14, 2009 12:35 PM
  • Peso, this also appears to be a great suggestion, thank you!

    This last idea is a pretty drastic change to how the queries were originally built (which I did not personally do so I need to more thoroughly validate the logic and results sets, but this also returned the same results in my initial testing in about 2 seconds.  So I am hopefull this is a valid approach because it would be simpler to extend.

    I will reply again one I have fully validated this approach.

    Thanks again.

    Jeff Cleath.

    Jeff Cleath Inetium, LLC jcleath@inetium.com
    Friday, August 14, 2009 12:39 PM
  • You can simplify the insert from

    SELECT * into #temp1 FROM
    (SELECT M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    FROM MonitorTable M1 WITH(NOLOCK)
    WHERE (M1.ClientID = 51 AND M1.Keyword NOT LIKE '%unmatched%') AND  M1.Date >= '5/15/2009'
    GROUP BY M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    HAVING SUM(M1.Conversions) = 0 ) AS A

    to this

    SELECT M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    INTO #Temp1
    FROM MonitorTable M1 WITH(NOLOCK)
    WHERE (M1.ClientID = 51 AND M1.Keyword NOT LIKE '%unmatched%') AND  M1.Date >= '5/15/2009'
    GROUP BY M1.DataSourceName, M1.Campaign, M1.AdGroup, M1.Keyword, M1.MatchType, M1.KeywordID, M1.AdGroupID, M1.DataSourceID, M1.MaximumCPC, M1.KeywordType
    HAVING SUM(M1.Conversions) = 0 
    
    Friday, August 14, 2009 1:05 PM
  • Thank you for your feedback.

    My second suggestion will scan the table only once, so there is much less change you get blocking issues using this query.
    The optimization trick for teh WHERE ... AND [Date]> '...'
    is that the date in the WHERE clause is the oldest date of all the original date filters.

    Friday, August 14, 2009 1:10 PM
  • Hi!

    Have you validated the results yet?
    Thursday, August 20, 2009 10:13 AM