none
string concatenation in where clause

    Question

  • The query below runs in less than a second in a SQL Server 2000 environment. When we upgraded to SQL Server 2005 the query takes about 8 minutes to run. I am trying to figure out why. There are about 5K rows in each table of the query. Also there are non-clustered concatenated indexes on each table for the columns involved in the where statement. The query runs fine in the 2005 environment when the suffix column is not concantenated.

    SELECT

    (PREFIX+ SUFFIX) FROM CDD.CADASTRALPARCELS
    WHERE ((PREFIX+ SUFFIX) NOT IN
    (SELECT (Parcel_ID+ Parcel_Suffix)
    FROM CDD.LANDUSE_ASSESSOR))

    Tuesday, June 30, 2009 6:35 PM

Answers

  • Try this instead:

    SELECT (Prefix+Suffix)
    FROM CDD.Cadastralparcels P
    WHERE NOT EXISTS
       (SELECT *
        FROM CDD.LandUse_Assessor
        WHERE Parcel_ID=P.Prefix AND Parcel_Suffix=P.Suffix)

    Does that improve the performance?

    I can't speak for how/why SQL2005 handles this differently.  Perhaps someone else can chime in about that.

    --Brad
    Tuesday, June 30, 2009 6:38 PM
    Moderator
  • I was going to post almost identical query, but Brad beat me.

    :-)

    EDIT:

    LOL! 

    I guess Alejandro also agrees.

    :-)


    Kent Waldrop

    Hey Kent (and Alejandro)...

    Do you know if SQL2000 took advantage of a string concatenation in using a multi-column index and SQL2005 stopped that behavior?  I don't have SQL2000 to test that out.

    --Brad

    You both beet me.

    Regards your question. We have never had that feature. Remember, it is recomended not to manipulate the columns in the predicate, in order to let the optimizer uses index statistics to estimate the selectivity of the predicate.

    Since SS 2005, you can create a computed column, add an index by this column, and the optimizer can recognize the formula in the predicate if it is used.


    Example:

    use tempdb
    go
    
    create table #t1 (
    c1 int not null identity primary key,
    c2 varchar(25) not null,
    c3 varchar(25) not null,
    c4 as c2 + c3 PERSISTED not null
    );
    go
    
    set nocount on;
    go
    
    insert into #t1(c2, c3) values('Microsoft SQL ', 'Server 11');
    go
    
    insert into #t1(c2, c3) values('Microsoft ' + left(newid(), 5), 'Server 11');
    go 1000
    
    set nocount off;
    go
    
    create index ix_t_c4
    on #t1(c4);
    go
    
    set showplan_text on;
    go
    
    select
    	c2, c3
    from
    	#t1
    where c2 + c3 = 'Microsoft SQL Server 11';
    
    select
    	c2, c3
    from
    	#t1
    where c4 = 'Microsoft SQL Server 11';
    go
    
    set showplan_text off;
    go
    
    drop table #t1;
    go
    


    This is one of those best practice, do not manipulate columns in the predicate, that you can work around using a computed column, if neccesary.

    Does this help?


    AMB
    Tuesday, June 30, 2009 7:13 PM
    Moderator

All replies

  • Try this instead:

    SELECT (Prefix+Suffix)
    FROM CDD.Cadastralparcels P
    WHERE NOT EXISTS
       (SELECT *
        FROM CDD.LandUse_Assessor
        WHERE Parcel_ID=P.Prefix AND Parcel_Suffix=P.Suffix)

    Does that improve the performance?

    I can't speak for how/why SQL2005 handles this differently.  Perhaps someone else can chime in about that.

    --Brad
    Tuesday, June 30, 2009 6:38 PM
    Moderator
  • I was going to post almost identical query, but Brad beat me.

    :-)

    EDIT:

    LOL! 

    I guess Alejandro also agrees.

    :-)


    Kent Waldrop
    Tuesday, June 30, 2009 6:39 PM
  • Use EXISTS operator instead.

    select *
    from CDD.CADASTRALPARCELS as a
    where not exists (
    select *
    from CDD.LANDUSE_ASSESSOR as b
    where b.Parcel_ID = a.PREFIX and b.Parcel_Suffix = a.SUFFIX
    );


    AMB

    Tuesday, June 30, 2009 6:40 PM
    Moderator
  • I was going to post almost identical query, but Brad beat me.

    :-)

    EDIT:

    LOL! 

    I guess Alejandro also agrees.

    :-)


    Kent Waldrop

    Hey Kent (and Alejandro)...

    Do you know if SQL2000 took advantage of a string concatenation in using a multi-column index and SQL2005 stopped that behavior?  I don't have SQL2000 to test that out.

    --Brad
    Tuesday, June 30, 2009 6:48 PM
    Moderator
  • Thanks so much, using the EXIST operator corrected the problem
    Tuesday, June 30, 2009 6:54 PM
  • >>I can't speak for how/why SQL2005 handles this differently.  Perhaps someone else can chime in about that

    Hi Brad..

    Your solution uses a correlated sub query. 4Putts' version creates a result set that has to be scanned against. So long as parcel_id and parcel_suffix are indexed, your version of the code should always perform better.

    JP
    Tuesday, June 30, 2009 6:57 PM
  • >>I can't speak for how/why SQL2005 handles this differently.  Perhaps someone else can chime in about that

    Hi Brad..

    Your solution uses a correlated sub query. 4Putts' version creates a result set that has to be scanned against. So long as parcel_id and parcel_suffix are indexed, your version of the code should always perform better.

    JP

    I understand that, but 4Putts was implying that SQL2000 was able to see that concatenation of the fields and apparently was able to take advantage of an index on (Parcel_ID,Parcel_Suffix), and the implication was that SQL2005 did NOT do that (since its performance was much poorer).

    I haven't gotten a confirmation on that, though, and unfortunately I can't test it.

    However, with all that being said, I believe that a IN query produces the exact same query plan as an EXISTS query (when it comes to a single column). 

    Check out the plans for these 2 queries in Northwind... they're identical:

    select companyname from customers where customerid in (select customerid from orders)
    select companyname from customers where exists (select * from orders where customerid=customers.customerid)

    (However, conversely, a NOT IN query does NOT produce the same plan as a NOT EXISTS query). 

    --Brad
    Tuesday, June 30, 2009 7:12 PM
    Moderator
  • I was going to post almost identical query, but Brad beat me.

    :-)

    EDIT:

    LOL! 

    I guess Alejandro also agrees.

    :-)


    Kent Waldrop

    Hey Kent (and Alejandro)...

    Do you know if SQL2000 took advantage of a string concatenation in using a multi-column index and SQL2005 stopped that behavior?  I don't have SQL2000 to test that out.

    --Brad

    You both beet me.

    Regards your question. We have never had that feature. Remember, it is recomended not to manipulate the columns in the predicate, in order to let the optimizer uses index statistics to estimate the selectivity of the predicate.

    Since SS 2005, you can create a computed column, add an index by this column, and the optimizer can recognize the formula in the predicate if it is used.


    Example:

    use tempdb
    go
    
    create table #t1 (
    c1 int not null identity primary key,
    c2 varchar(25) not null,
    c3 varchar(25) not null,
    c4 as c2 + c3 PERSISTED not null
    );
    go
    
    set nocount on;
    go
    
    insert into #t1(c2, c3) values('Microsoft SQL ', 'Server 11');
    go
    
    insert into #t1(c2, c3) values('Microsoft ' + left(newid(), 5), 'Server 11');
    go 1000
    
    set nocount off;
    go
    
    create index ix_t_c4
    on #t1(c4);
    go
    
    set showplan_text on;
    go
    
    select
    	c2, c3
    from
    	#t1
    where c2 + c3 = 'Microsoft SQL Server 11';
    
    select
    	c2, c3
    from
    	#t1
    where c4 = 'Microsoft SQL Server 11';
    go
    
    set showplan_text off;
    go
    
    drop table #t1;
    go
    


    This is one of those best practice, do not manipulate columns in the predicate, that you can work around using a computed column, if neccesary.

    Does this help?


    AMB
    Tuesday, June 30, 2009 7:13 PM
    Moderator
  • By the way, yes, you can create an index on ANY expression in FoxPro... I did a FoxPro-related SQL presentation 2 weeks ago at a Users Group and a few SQL Server people were there and their eyes got wide when I created an index on some complicated expression, and they said, "Wow, you can do that?". 

    I understand about the computed columns and how you can create an index on that.

    But this is what I meant by my question:

    A table has two columns, A and B.

    If you create a non-clustered index on (A,B), did SQL2000 somehow take advantage of that index in 4Putt's IN query (WHERE X+Y IN (SELECT A+B FROM MyTable))?

    That's what he seemed to be implying since he said that the same query took less than a second in SS2000, but over 8 minutes in SS2005.


    --Brad
    Tuesday, June 30, 2009 7:22 PM
    Moderator
  • To the OP,

    I dont think the optimizer in SQL 2000 was able to handle a concatenated column values better than SQL 2005.  I think the difference here is the join, as a concatenated column value is not a good candidate for a higly performant search argument in SQL 2000 or SQL 2005.  I believe SQL 2000 probably did a hash join, while SQL Server 2005 used a nested loop join.  The difference is the nest loop join has to be evaluated by every row in the outer query. I believe that using the values in the exists clauses allowed, in this case, the optimizer to pick a better query plan.   We would need the execution plans of each query to compare though.


    http://jahaines.blogspot.com/
    Tuesday, June 30, 2009 7:23 PM
    Moderator
  • By the way, yes, you can create an index on ANY expression in FoxPro... I did a FoxPro-related SQL presentation 2 weeks ago at a Users Group and a few SQL Server people were there and their eyes got wide when I created an index on some complicated expression, and they said, "Wow, you can do that?". 

    I understand about the computed columns and how you can create an index on that.

    But this is what I meant by my question:

    A table has two columns, A and B.

    If you create a non-clustered index on (A,B), did SQL2000 somehow take advantage of that index in 4Putt's IN query (WHERE X+Y IN (SELECT A+B FROM MyTable))?

    That's what he seemed to be implying since he said that the same query took less than a second in SS2000, but over 8 minutes in SS2005.


    --Brad

    I edited my post, after re-reading your question.


    AMB
    Tuesday, June 30, 2009 7:26 PM
    Moderator
  • Thanks so much, using the EXIST operator corrected the problem

    If it solved the problem, can you mark at least Brad's suggestion as the answer, please?


    AMB
    Tuesday, June 30, 2009 7:29 PM
    Moderator

  • > (However, conversely, a NOT IN query does NOT produce the same plan as a NOT EXISTS query). 

    It does, you need to eliminate the possibilities for NULLs... :)

    SELECT companyname 
    FROM customers 
    WHERE customerid NOT IN (SELECT customerid 
                             FROM orders 
                             WHERE CustomerID IS NOT NULL);
                             
    SELECT companyname 
    FROM customers 
    WHERE NOT EXISTS (SELECT * 
                      FROM orders 
                      WHERE customerid = customers.customerid);

    Plamen Ratchev
    Tuesday, June 30, 2009 9:05 PM
    Moderator
  • >
    >It does, you need to eliminate the possibilities for NULLs... :)
    >

    I know... I meant a straight vanilla NOT IN and NOT EXISTS (without adding an extra IS NOT NULL to the WHERE clause of the IN query).


    --Brad
    Tuesday, June 30, 2009 9:11 PM
    Moderator
  • Yes, but the possibility to have NULLs makes them logically different queries that can return different results, and this is why you get different execution plans. Adding the IS NOT NULL predicate makes them equivalent queries.
    Plamen Ratchev
    Tuesday, June 30, 2009 9:19 PM
    Moderator
  • I understand that.

    I guess I just wanted to make the point to John that even though IN = EXISTS, one should not assume that NOT IN = NOT EXISTS, because they are NOT the same and will NOT behave the same.

    (However, as you pointed out, you CAN make them equivalent by accounting adding the IS NOT NULL to the NOT IN query).

     


    --Brad
    Tuesday, June 30, 2009 9:28 PM
    Moderator
  • Thanks AMB this helped me solve my long pending problem

    Ajay_K

    Saturday, May 22, 2010 7:23 AM