none
Partition Elimination doesn't occur with CHAR column and basic queries?

    Question

  • It seems that partitioning on a CHAR column is not at straight forward as I imagined.  

    I have several ODS and data warehouse databases where almost every table is being queried with a char(3) column in the where clause.  This column, Div, is for the division (company) ID.  Thus, Div is a natural candidate for partitioning, not only for query performance, but also ETL efficiency, as companies are sending data late or out of band but then expecting immediate gratification.

    During initial testing of partitioning on this CHAR(3) column, it became immediately apparent that our basic (automated) where filtering does not work as expected.  On a table with a defined partition function using CHAR(3) and the 43 Div literals, here are two illustrative queries:

    Select 
    count(1)
    from PARTMSTP_PT t
    where 
    t.Div = 'atk'

    Select 
    count(1)
    from PARTMSTP_PT t
    where 
    t.DIv = cast('atk' as char(3))


    and their Statistics IO ouput:

    (1 row(s) affected)
    Table 'PARTMSTP_PT'. Scan count 42, logical reads 171174, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    (1 row(s) affected)
    Table 'PARTMSTP_PT'. Scan count 1, logical reads 4245, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    As you can see, the simple div = 'xxx' is scanning all 42 partitions, but the overly complex clause using a cast to char(3) results in the expected single patition scan.

    It is unreasonable to expect all queries in our ad-hoc reporting system to cast to char(3).  I'm planning on testing a similar test case with a partition function defined with varchar(3), but would like to throw out the questions....

    What is the accepted pattern for partitioning (and subesquent querying by ad-hoc and other means) of a CHAR column?  Users, admins, reporting systems, developers, etc.,  can't possibly be expected to cast for every query they build or execute!

    Is it acceptable to use a varchar PF against a known CHAR column (if it works), and what additional gotcha's should I be considering if this does work?  Will joins be aligned, indexes, indexed views, etc., etc.?

    I don't have a good feeling about these results at such an early stage of testing!


    Tuesday, February 07, 2012 5:39 AM

Answers

  • Your problem is implicit conversion of the CHAR(3) column to the VARCHAR datatype that you provided with = 'atk' in the code.  The CHAR datatype has a lower precidence and the engine always converts from lower to higher when there is a mismatch of the datatypes.  If the conversion occurs on the table column side of the operation, you get a Index Scan which precludes partition elimination.

    http://msdn.microsoft.com/en-us/library/ms190309.aspx

    If you want to use a CHAR column, you need to provide a CHAR filter if you expect the engine to not perform an implicit conversion and force a scan of the entire table or index.  This is a fundamental of good design and development practices, and as you see it will kill performance if you don't do it correctly.  You get similar problems if you have a CHAR column in one table and a VARCHAR column in another table, and you perform a JOIN with those two columns, only it shows as a Probe Residual in the JOIN operation of the execution plan.  Do a search online for "SQL Implicit Conversion" and you will find a number of blog posts that show exactly what you are seeing.


    Jonathan Kehayias | Principal Consultant, SQLSkills.com
    SQL Server MVP | Microsoft Certified Master: SQL Server 2008
    Author of Troubleshooting SQL Server: A Guide for Accidental DBAs
    Feel free to contact me through My Blog or Twitter. Become a SQLskills Insider!
    Please click the Mark as Answer button if a post solves your problem!


    Tuesday, February 07, 2012 1:47 PM

All replies

  • Your problem is implicit conversion of the CHAR(3) column to the VARCHAR datatype that you provided with = 'atk' in the code.  The CHAR datatype has a lower precidence and the engine always converts from lower to higher when there is a mismatch of the datatypes.  If the conversion occurs on the table column side of the operation, you get a Index Scan which precludes partition elimination.

    http://msdn.microsoft.com/en-us/library/ms190309.aspx

    If you want to use a CHAR column, you need to provide a CHAR filter if you expect the engine to not perform an implicit conversion and force a scan of the entire table or index.  This is a fundamental of good design and development practices, and as you see it will kill performance if you don't do it correctly.  You get similar problems if you have a CHAR column in one table and a VARCHAR column in another table, and you perform a JOIN with those two columns, only it shows as a Probe Residual in the JOIN operation of the execution plan.  Do a search online for "SQL Implicit Conversion" and you will find a number of blog posts that show exactly what you are seeing.


    Jonathan Kehayias | Principal Consultant, SQLSkills.com
    SQL Server MVP | Microsoft Certified Master: SQL Server 2008
    Author of Troubleshooting SQL Server: A Guide for Accidental DBAs
    Feel free to contact me through My Blog or Twitter. Become a SQLskills Insider!
    Please click the Mark as Answer button if a post solves your problem!


    Tuesday, February 07, 2012 1:47 PM
  • Jonathan,

    Thanks so much for your rapid and authoritative response and I hope to illicit a detail or two more from you regarding my partition quandary, as well as this phenom in general.

     I am, in fact, familiar with the efficacy of explicit type conversions, avoiding boxing and the like.  However, at the risk of sounding perhaps less SQL savvy than I would like to admit, and maybe even opening a can of worms, I must say this... WOW!

    During my years of SQL development and administration, I must contend that the only rules I have come across relating to the choice of CHAR over Varchar are those relating to space efficiency; how long is the max string length and what will be the expected distribution of lengths of that string.  My immediate recollection is something like a cutoff at less than 8 characters long should be a CHAR to avoid the overhead involved with varchar, and for longer strings, sticking with CHAR only if the vast majority of the length will usually be utilized.  I don't ever recall reading an explanation of what data type to choose that also cautioned about potential implicit table side type conversions causing join performance issues or, in this case, worse.  Is there a huge hole out there in introductory material, or have I really been that inattentive, or is this a level of nuance seldom broached?

    Following your suggestion, I ggld and found:

    "SQL Implicit Conversion" = 3450 hits

    +"SQL Implicit Conversion" +partition = 8 hits

    Apparently not a lot of info out there for such a potentially important subject (especially for partitioning)!

    This is the first case I've encountered where I regret choosing CHAR for a string so short and wish I had chosen varchar oh so many years ago!

    Changing the partition function definition to varchar(3) threw an error which I half expected and, after considering your explanation, would have done nothing for me because the real conversion would still be taking place table side with the same result.  

    Only Possible Solutions: (Please correct me if I'm wrong)

    1. always use a cast on where clauses
    2. change the column data type of hundreds of tables...

    Thanks again Jonathan, I do appreciate your time and efforts!


    • Edited by Dwaine Wright Wednesday, February 08, 2012 12:41 AM oh so many spelling errors!
    Wednesday, February 08, 2012 12:37 AM
  • There is a third option, which is to change your code so that it uses a parameterized query and pass the parameter with SqlDbType.Char and a Size of 3, that would solve the issue as well.  The performance issue is that a conversion on the column side of the data can change the meaning of the data and it therefore has to be scanned entirely, converted to the new type, and then compared to the value being used for filtering.  If the conversion happened the other way, like where you do an explicit CAST of the value to a CHAR(3), you don't have the performance issue.

    You are the first person I've know of that has found this out because you didn't get partition elimination.  Implicit conversions are a problem in a lot of code, it is something I see everyday consulting.  There are articles and guides out there that talk about matching data types as a best practice, for example:

    http://www.codeproject.com/Articles/35665/Top-10-steps-to-optimize-data-access-in-SQL-Server

    It really is a matter of, for the lack of a better description, "lazy coding" or "lazy design", even though it usually happens to be unintentional because so much stuff gets handled under the covers for you by the systems being implemented.  I'm not saying that the developers specifically are lazy, just that the advances in technology have made it easier to hit problems like this simply because you didn't know something really simple but incredibly important.   In .NET if you use a parameterized query, you can set a value for a SqlParameter without ever setting a type or size for the parameter.  I've seen systems running 70%+ CPU drop to under 20% just by fixing implicit conversion issues in the code.

    A very basic best practice for most RDBMS systems is to use stored procedures or parameterized queries for all code.  There are lots of reasons for this that range from good plan cache reuse to preventing SQL Injection, but writing good parameterized code takes time, and it is often skipped in lieu of concatenated values which SQL Server has to then determine what that value should be for a datatype.  If it looks like a string, it gets treated as a VARCHAR(8000) unless you specify it to Unicode with the N'TextGoesHere', and then it is treated like a NVARCHAR(4000).  For .NET code the SqlDbType and Size should be specified for parameters, something I often see left out, and it can result in these problems as well.

    http://msdn.microsoft.com/en-us/library/ms161953.aspx



    Jonathan Kehayias | Principal Consultant, SQLSkills.com
    SQL Server MVP | Microsoft Certified Master: SQL Server 2008
    Author of Troubleshooting SQL Server: A Guide for Accidental DBAs
    Feel free to contact me through My Blog or Twitter. Become a SQLskills Insider!
    Please click the Mark as Answer button if a post solves your problem!

    Wednesday, February 08, 2012 1:18 AM
  • Sorry about the length of this post, it kind of got away from me!

    Ok, I get, agree with AND practice 95% of what you have said when coding .NET apps, but we’re straying a bit from the OP where I stated, “It is unreasonable to expect all queries in our ad-hoc reporting system to cast to char(3).”  I guess what I should have said was, “we have no control over the SQL generated by our reporting system.”  This isn’t .NET my friend, we’re stuck with literals at this point unless I can get the ISV to change code!

    I do appreciate your efforts and this opportunity to learn about something so pivotal to SQL query performance.

    I gave the example of Div = 'xyz' because we use a 3rd party enterprise reporting system that essentially takes user reports built in a GUI and translates them into dynamic SQL statements under the covers.  One "feature" of this system is a static security configuration file that allows us to map app group membership to a column / value pair (i.e., row level security).  The majority of our data access security is configured like "User Group XYZ uses Div = 'XYZ'", and the system automagically adds this as a where clause condition.  (ugh) 

    So, I’ve got some questions and observations.

    1.       I have been investigating your explanation.  It occurred to me that most of our clustered indexes begin with the Div char(3) column (which has worked well for us performance wise, despite the low selectivity).  The fact that these CIs ARE being used for seeking seems to go against the model you are proposing, as I understand it.  Wouldn’t the Div column in this case have to be cast to varchar(8000) and wouldn’t this force a scan?  If what you say is true, how is it that the same query in question performs a CI seek on a non-partitioned (production version of the) table, but a scan on every partition on the test version?
    2.       My testing indicates that Data Type Precedence is also string length sensitive.  Longer strings have higher precedent?  I have tested column/partition function combos with different varchar lengths and don’t get partition elimination until I use Div Varchar(8000), which seems to support your explanation, but ONLY for partition elimination, not index scanning, as indicated above.
    3.       Could you provide a reference to your statement that “If it looks like a string, it gets treated as a VARCHAR(8000) unless you specify it to Unicode with the N'TextGoesHere', and then it is treated like a NVARCHAR(4000).”  I have found such a reference in Technet, but wrt FORCED Parameterization and I assume that’s not what you are referring to.  My curiosity is piqued RE this subject!
    4.       This is probably a subject for an entirely different thread, but I’m surprised that the decision would be made to have the query processor consider string literals treated as varchar(8000), the HIGHEST precedent in its class, instead of something like CHAR(n) where n is the actual length of the literal.  This would make literals the LOWEST possible precedent in its class and prevent this whole mess in the first place.  Is there some overwhelming argument for making the choice they did?   You are presenting this a the status quo, but since my OP, I’ve encountered dozens of samples relating to code performance and index function that neglect to cast Right Side literal strings.  In addition, in general, would forcing such casting not make code much less readable?

    Please keep in mind, my goal is to partition out the larger (if not all) tables in a way that we can, with impunity, load and post-process individual company’s data sets independently.  Right now, all tables have a Div Char(3) column that is also usually in the clustered index.  It seemed like a natural candidate for partitioning.  I’m willing to go through the effort of changing columns, indexes, etc., to this end, but losing partition elimination due to nuances in reporting system and SQL  server functionality make this hope seem a million miles away.

    Thursday, February 09, 2012 10:58 PM
  • Do you have demos/example where you think it is going contrary to this that you can post?  Its going to be a few weeks before I can put a lot of effort into designing a bunch of examples to explain this if you don't.  For your first question I'd need to know an example table and index definition to see the execution plan generated to explain what is happening.  The same goes for number two as well.  For number three you can see it in the plans that generate and the data type associated for the convert_implicit operation.  The mechanism is the same essentially between this, Simple Parameterization and Forced Parameterization, but I don't know that it  is expicitly documented somewhere.  The use of VARCHAR(8000) is probably rooted in the auto parameterization implementation in the engine and is focused towards a more reusable plan in cache, even under Simple Parameterization.  If everything goes to VARCHAR(8000) it is more likely to be reused for another adhoc statement than a CHAR() fixed length parameter would.

    Jonathan Kehayias | Principal Consultant, SQLSkills.com
    SQL Server MVP | Microsoft Certified Master: SQL Server 2008
    Author of Troubleshooting SQL Server: A Guide for Accidental DBAs
    Feel free to contact me through My Blog or Twitter. Become a SQLskills Insider!
    Please click the Mark as Answer button if a post solves your problem!

    Thursday, February 09, 2012 11:10 PM
  • /*-----------------------------------------------------------------------------
      Date			: 23 March 2011
      SQL Version           : SQL Server 2005/2008/Denali
      Author		: Jacob Sebastian
      Email			: jacob@beyondrelational.com
      Twitter		: @jacobsebastian
      Blog			: http://beyondrelational.com/blogs/jacob
      Website		: http://beyondrelational.com
    
      Summary:
      This script is part of the demo presented by Jacob Sebastian during 
      Tech-ED India 2011 on "TSQL Worst Practices".
    
      Disclaimer:  
      THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF 
      ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED 
      TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
      PARTICULAR PURPOSE.
    -----------------------------------------------------------------------------*/
    
    
    	IF OBJECT_ID('zTestTable_DW', 'U') IS NOT NULL BEGIN
    		DROP TABLE zTestTable_DW 
    	END
    		GO
    
    	CREATE TABLE zTestTable_DW (ID INT IDENTITY NOT NULL, OrderDate DATETIME, Amount MONEY, Refno_varchar VARCHAR(20), Refno_char char(20) NOT NULL, Refno_int int)
    	INSERT INTO zTestTable_DW (OrderDate, Amount, Refno_varchar, Refno_char, Refno_int) 
    	SELECT TOP 100000
    		DATEADD(minute, ABS(a.object_id % 50000 ), CAST('2010-02-01' AS DATETIME)),
    		ABS(a.object_id % 10),
    		CAST(ABS(a.object_id) AS VARCHAR),
    		CAST(ABS(a.object_id) AS CHAR),
    		a.[object_id]
    	FROM sys.all_objects a
    	CROSS JOIN sys.all_objects b
    	INSERT INTO zTestTable_DW (OrderDate, Amount, Refno_varchar, Refno_char, Refno_int)  SELECT GETDATE(), 100, '555', '555', 555
    
    	-- Create a NCIs
    	CREATE NONCLUSTERED INDEX idx_refnovarchar ON zTestTable_DW(Refno_varchar)
    	include(amount)
    
    	CREATE NONCLUSTERED INDEX idx_refnochar ON zTestTable_DW(refno_char)
    	include(amount)
    
    	CREATE NONCLUSTERED INDEX idx_refnoint ON zTestTable_DW(refno_int)
    	include(amount)

    SQL provided to create a sample table with data, borrowed from author credited in header.  I added several columns and indexes to allow further testing.

    Running these two queries against the char column and looking at the query plan indicates that, in the most basic index usage scenario, a literal is not treated as varchar(8000), 

    SELECT COUNT(*) 
    FROM zTestTable_DW 
    WHERE Refno_char = '555'
    
    SELECT COUNT(*) 
    FROM zTestTable_DW 
    WHERE Refno_char = N'555'

    The first query, using your model, should, as I understand it, cause a scan since the column side char(20) is lower in precedence than the assumed varchar(8000).  However, it results in an index seek. Even a cast to varchar(8000) doesn't force a scan, but forcing the literal to unicode forces a table scan...


    So, I do NOT see the IMPLICIT_CONVERSION happening against a normal, unpartitioned table.  Further examples to follow.



    Tuesday, February 14, 2012 1:21 AM
  • Turn on the Actual Plan in SSMS and execute your two queries.  Then flip over to the Execution Plan tab and right-click to Show Execution Plan XML.  For the first statement you should have something like:

            <StmtSimple StatementCompId="1" StatementEstRows="1" StatementId="1" StatementOptmLevel="TRIVIAL" StatementSubTreeCost="0.0032842" StatementText="SELECT COUNT(*) FROM [zTestTable_DW] WHERE [Refno_char]=@1" StatementType="SELECT" QueryHash="0x42454C4E154217F2" QueryPlanHash="0x9A169C0E64A61316" RetrievedFromCache="true">

    Grab the bold portion above from your plan, and plug it into the following query:

    select text
    from sys.dm_exec_query_stats
    cross apply sys.dm_exec_sql_text(sql_handle)
    where query_plan_hash=0x9A169C0E64A61316

    You will get:

    (@1 varchar(8000))SELECT COUNT(*) FROM [zTestTable_DW] WHERE [Refno_char]=@1

    The unicode one goes to an NCHAR(20), so I'll send an email to a friend and ask the specifics of the implementation in 2008/2008R2.  I'd have to go back to 2005 and see if this is a change in parameterization behavior compared to what I recall.  Give me a few days to look at this in more detail and ask some questions to people at Microsoft and I'll get back to you.

    BTW: Jacob is a good friend of mine and he is happy to have his code cited like that.

    EDIT:

    If you set Parameterization to Forced for the database, the second query generates the following:

    (@0 nvarchar(4000))select COUNT ( * ) from zTestTable_DW where Refno_char = @0

    so it may be that I am mixing two functionalities up a little here, but your original partition elimination issue is still going to be due to the CONVERT_IMPLICIT occurring on the column side.  You can verify that in the plan.


    Jonathan Kehayias | Principal Consultant, SQLSkills.com
    SQL Server MVP | Microsoft Certified Master: SQL Server 2008
    Author of Troubleshooting SQL Server: A Guide for Accidental DBAs
    Feel free to contact me through My Blog or Twitter. Become a SQLskills Insider!
    Please click the Mark as Answer button if a post solves your problem!


    Tuesday, February 14, 2012 1:39 AM
  • I appreciate it, Jonathan.

    FYI, this is on 2008 SP1. (10.0.2531)

    I'll now show the examples of the partitioning issue.... 

    Create the partition fn/sch, drop the example indexes, add a CI on the scheme:

    IF  EXISTS (SELECT * FROM sys.partition_schemes WHERE name = N'psCharRight')
    DROP PARTITION SCHEME [psCharRight]
    GO
    
    IF  EXISTS (SELECT * FROM sys.partition_functions WHERE name = N'pfCharRight')
    DROP PARTITION FUNCTION [pfCharRight]
    GO
    
    CREATE PARTITION FUNCTION [pfCharRight](char(20)) AS RANGE RIGHT FOR VALUES (N'0', N'1', N'2', N'3', N'4', N'5', N'6', N'7', N'8', N'9')
    GO
    
    CREATE PARTITION SCHEME [psCharRight] AS PARTITION [pfCharRight] ALL TO ([PRIMARY])
    GO
    
    --drop the current sample indexes
    DROP index idx_refnochar ON zTestTable_DW
    DROP index idx_refnovarchar ON zTestTable_DW
    DROP index idx_refnoint ON zTestTable_DW
    
    --now create PK (with CI) on partition
    ALTER TABLE [dbo].zTestTable_DW ADD  CONSTRAINT zTestTable_DW_PK PRIMARY KEY CLUSTERED 
    (
    	ID ASC,
    	Refno_char ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [psCharRight](Refno_char)
    GO
    

    Now run the following then viewing the execution plan shows the different query optimizer implementations (or at least end results) of implicit conversion due to data type precidence rules between index and partition use.

    SELECT COUNT(*) 
    FROM zTestTable_DW 
    WHERE Refno_char = '555'
    
    SELECT COUNT(*) 
    FROM zTestTable_DW 
    WHERE Refno_char = N'555'
    
    SELECT COUNT(*) 
    FROM zTestTable_DW 
    WHERE Refno_char = CAST('555' AS CHAR(20))
    Hmmm...  I initially wrote this post using the phrase sargable (to indicate a "Search ARGument ABLE" predicate that results in index seeking).  However, in this partition elimination case, sargeable doesn't seem to apply!  Maybe "Partition Elimination aRGument ABLE"?  Shall we call it "PERGable" if a predicate allows partition elimination? ;-)  In this case, since the partitioned column does not have to be the first column in any index, index seeking is not what's at stake.


    Now the query plans show a different story for '555'.  Although it is a simple literal string and would be assumed to be PERGable against a CHAR(20) (since it is SARGable in the index example), it is forcing a scan of all 11 partitions and thus not PERGable.  

    I'm curious about difference in the query plans between the plainly visible IMPLICIT_CONVERSION in the N'555' case used here to illustrate explicitly forcing column side up conversion, versus the up conversion in the '555' case which is much less obvious and can be detected only by Jonathan's previously posted dm_exec_query_stats method for seeing the parameter use, or by noticing the lack of partition elimination, far too discrete for my taste.  

    The best case scenario in the partitioned table instance is the explicit cast to char(20) of the literal.  This results in a scan of only 1 partition.

    Please note: I purposefully made the first CI column different than the partitioning column to force this case.  I didn't want a "best case" scenario and my ultimate intent (practicable or not) is to reap a SECOND ace in the hole by not using the first CI column on a less selective (but in this case, useful AND partitioned) column.  For the sake of sanity, I'm going to start another thread to question the efficacy of this approach and dive into some of the strange results I've seen.

    Wednesday, February 15, 2012 12:51 AM
  • The 'problem' here is related to simple parameterization as Jonathan says.  The idea of simple parameterization is to promote query plan reuse of course, but the question is what type SQL Server should assign to the string parameter.  It would not make sense, in general, to produce a parameterized form with a specific length because the type definition forms part of the parameterized text of the query.  Having 8000 cached plans with parameters of varchar(1)... varchar(8000) would not be very useful, so SQL Server chooses varchar(8000) or nvarchar(4000) for ANSI and Unicode string literals respectively.  So, the parameterized form of the query is:

    (@1 varchar(8000))SELECT COUNT(*) FROM [dbo].[zTestTable_DW] WHERE [Refno_char]=@1

    When we add an explicit cast to char(20), the parameterized form is:

    (@1 varchar(8000))SELECT COUNT(*) FROM [dbo].[zTestTable_DW] WHERE [Refno_char]=CONVERT([char](20),@1)

    So why does the second form result in dynamic partition elimination while the first one does not?  Because of the possibility of string truncatation.  Only the second query guarantees that the char(20) result can be matched to a partition without truncation.  To see this for yourself, consider that the char(20) query above results in dynamic partition elimination (using RangePartitionNew in a Seek Keys predicate), while an explicit cast to char(21) or varchar(21) does not:

    -- Dynamic partition elimination using RangePartitionNew
    SELECT COUNT(*) FROM dbo.zTestTable_DW WHERE Refno_char = CONVERT(char(20), '555')
    -- No partition elimination
    SELECT COUNT(*) FROM dbo.zTestTable_DW WHERE Refno_char = CONVERT(char(21), '555')
    

    If you want static partition elimination, you need to avoid simple parameterization completely.  This is not usually much of a problem because (a) SQL Server is very conservative with simple parameterization; and (b) it is easy to defeat.  A couple of ways to avoid simple parameterization without affecting the semantic of the test queries follows:

    SELECT COUNT(*) FROM dbo.zTestTable_DW WHERE Refno_char = '555' AND 1 != 2
    SELECT COUNT(*) FROM dbo.zTestTable_DW WHERE Refno_char = '555' OPTION (MAXRECURSION 100)
    

    I should stress that SQL Server goes to some lengths to retain SARGability even where users are a bit sloppy with types.  The example queries don't much care whether the partition scheme or column are defined as char or varchar, for example, and the same goes for nchar and nvarchar.  There is also no 'precedence' within string types that takes account of length, so please forget about that idea.  Where sloppiness with types can cause problems is where collation issues mean SQL Server cannot transparently correct the sloppiness without affecting the query semantic:

    -- Seek (thanks to heroic efforts by the optimizer)
    DECLARE @T TABLE (col1 varchar(10) COLLATE Latin1_General_CI_AI UNIQUE)
    SELECT * FROM @T WHERE col1 = N'a'
    GO
    -- Scan
    DECLARE @T TABLE (col1 varchar(10) COLLATE SQL_Latin1_General_CP1_CI_AI UNIQUE)
    SELECT * FROM @T WHERE col1 = N'a'
    

    The SQL collation has different rules for ANSI and Unicode comparisons, so the second example can at best scan the index.  You can read more about the magic performed on your behalf by the query optimizer in the first case in my blog post here.  It always pays to be careful with types and collations, especially ANSI and Unicode.  There is an example of carelessness (benign in this case) in the PARTITION FUNCTION definition:

    CREATE PARTITION FUNCTION [pfCharRight]
        (char(20)) 
    AS RANGE RIGHT 
        FOR VALUES (N'0', N'1', N'2', N'3', N'4', N'5', N'6', N'7', N'8', N'9')

    The type definition is char, but the supplied values are all Unicode!

    Paul White


    Thursday, February 16, 2012 7:41 AM
  • Hey Paul,

    I knew I had this pegged right when I asked you to look at it.  Nobody knows the internals of query optimization like you do.  I completely missed the unicode declaration in the partitioning function like that.  WTG on solving another non-trivial plan optimization issue once again!

    Cheers,


    Jonathan Kehayias | Principal Consultant, SQLSkills.com
    SQL Server MVP | Microsoft Certified Master: SQL Server 2008
    Author of Troubleshooting SQL Server: A Guide for Accidental DBAs
    Feel free to contact me through My Blog or Twitter. Become a SQLskills Insider!
    Please click the Mark as Answer button if a post solves your problem!

    Thursday, February 16, 2012 8:02 AM
  • I ran into this issue and this thread helped me immensly. Thank you all. One thing to add.

    When you use Management Studio to script out a partition function it will put in the "N" in front of the literals regardless if it was created without the N to make it unicode.

    CREATE PARTITION FUNCTION [pfCharRight]
        (char(20)) 
    AS RANGE RIGHT 
        FOR VALUES ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9')

    If you go into SSMS and right click on it and select script you get the following.

    CREATE PARTITION FUNCTION [pfCharRight](char(20)) AS RANGE RIGHT FOR

    VALUES (N'0 ', N'1 ', N'2 ', N'3 ', N'4 ', N'5 ', N'6 ', N'7 ', N'8 ', N'9 ') GO



    Wednesday, September 19, 2012 8:45 PM
  • Eric,

    It's nice to see someone else identifying the same problem and gaining benefit from the thread.  I'm ESPECIALLY impressed that you figured out the last hanging chad, good work!  Kind of surprising that an MVP wouldn't know about such a basic bug in the toolset. 

    I determined this immediately after reading Paul White's final dig...

    "There is an example of carelessness (benign in this case) in the PARTITION FUNCTION definition"

    ...since I KNEW my orginal scripts didn't specify unicode.  Would have kind of defeated the purpose of the sample, don't you think!?

    I started a response several times to inform him of his faux pas, but it just kept turning into a rant.  I finally decided to let Paul figure it out on his own.

    Cheers!


    • Edited by Dwaine Wright Wednesday, September 19, 2012 11:15 PM I didn't like my unsavory comment in retrospect.
    Wednesday, September 19, 2012 9:44 PM
  • Kind of surprising that an MVP wouldn't know about such a basic bug in the toolset. 

    I determined this immediately after reading Paul White's final dig...

    "There is an example of carelessness (benign in this case) in the PARTITION FUNCTION definition"

    ...since I KNEW my orginal scripts didn't specify unicode.  Would have kind of defeated the purpose of the sample, don't you think!?

    I started a response several times to inform him of his faux pas, but it just kept turning into a rant.  I finally decided to let Paul figure it out on his own.

    It's sad when you appreciate the time and effort someone spends answering your question, only to be repulsed by a decidedly immature, pompus, and unfriendly close.

    Dwaine,

    You're reading so much more into that statement than was intended.  I guess misinterpretation is always a risk in forums where text has no human context.  If we had been discussing this in person, it would have been clear to you that I was not 'having a dig' at you.

    I am sorry it upset you so much, but it is an example of carelessness with types - just by SSMS not you :)

    No, I didn't know SSMS did that when scripting (I do now, so thanks).  It is benign though (as I said) because the types are converted and stored correctly in sys.partition_range_values.

    I am glad you appreciate the time and effort I put into answering your question, though.

    By the way, this issue came up again on another forum recently, so I blogged about it: http://bit.ly/PartElim

    Paul


    • Edited by Paul White NZMVP Wednesday, September 19, 2012 10:11 PM
    • Proposed as answer by Naomi N Wednesday, September 19, 2012 10:26 PM
    Wednesday, September 19, 2012 10:09 PM
  • Paul,

    Thanks for the thoughtful and tempered response.  I appologise for my unwarranted comment, which read even harsher now than when I first wrote it.

    Wednesday, September 19, 2012 11:16 PM
  • Paul,

    I hope you can enlighten me regarding partition elimination in smalldatetime data type:

    http://stackoverflow.com/questions/8274632/how-can-i-make-sure-partition-prunning-works-at-run-time

    Tuesday, November 20, 2012 10:36 AM