Table layout column definition by variable type RRS feed

  • Question

  • Good morning

    I created a table definition in SQL Server 2K8 and submitted it for review.  It was recommended that I change my logical column groupings to a table definition that had varchar fields at the end of the table.  This caught me off guard as I had never heard that before.  I looked it up and I would have to think that SQL Server would optimize column storage automatically and search times wouldn't be impacted by the existence of varchar data in a table row.  I cannot however find a SQL Server, much less a SQL Server 2K8 answer to prove/disprove my reasoning.  Any insight would be most helpful as I am not a DBA.  I am sure that this has to be here somewhere on the site but I couldn't find it by keywords

    Thank you.

    Tuesday, April 22, 2014 1:25 PM


All replies

  • Your terminology is unusual - I have no idea what a "logical column grouping" is and that term sounds suspiciously like a normalization problem.   Regardless, if you want comments on a table you should post the DDL for the table.  I do however have 2 comments. 

    1. If your reviewers made a suggestion and your reviewers are assigned to such a task based on their proficiency, is it wise to argue against their recommendations?  An alternative viewpoint is that you know more about sql server than the reviewers. Be careful how you proceed. 
    2. A code review should be an educational process.  Recommendations should also include the reasons that support any changes.  If this is missing from the process, than the process is not functioning as well as it could. 

    And one last comment.  There are many myths and traditions that people have "learned" over time - some of which may have been applicable to much older versions of sql server but are no longer justifiable.  Yet we still cling to them for many reasons.  This may be one of those instances.  If some senior person says to move a column because it is "better" for some reason, does it really matter to you if there is no real advantage to doing so?  For performance tuning you generally focus on indexes and not the physical order of columns in the actual table.  If you are curious then the best advice is to test it yourself. 

    Tuesday, April 22, 2014 3:58 PM
  • Fair points.  I have de-indentified the table and added it below.  By logical grouping I mean that in the layout I submitted (below), the varchar(20) id fields are next to their corresponding timestamps and if you were reading this table they at least seem to me to go together.  The recommendation was that all the Ids fields belong at the end of the table definition (without explanation) due to their data types (varchar).  Looking online I see some people asserting that varchar fields affect search performance, something I have never heard of before.  I am attempting to educate myself on this matter as it is something I have never heard of before.  I can always create a View to set the columns however I like but if there is something to be learned, I would like to learn it.  If this is simply not the case, I will just adjust to work around an incorrect assertion in this instance and not let it affect future decisions. 

    CREATE TABLE [dbo].[NewTable]
     KeyFieldA dec(5,0) not null,
     KeyFieldB dec(8,0) not null,
     KeyDate date not null,
     RequestQuantity int not null,
     RequestCode smallint not null,
     CreatedDateTime datetime not null,
     CreatedId varchar(20) not null,
     LastUpdatedDateTime datetime,
     LastUpdatedId varchar(20),
     ResponseQuantity int,
     ResponseCode smallint,
     ResponseUpdateDate datetime,
     ResponseUpdateId varchar(20)
     CONSTRAINT PK_SpecialOrder PRIMARY KEY NONCLUSTERED ([keyFieldA], [KeyFieldB], [KeyDate])



    • Edited by Cheese Bread Tuesday, April 22, 2014 5:06 PM grammar faux pas
    Tuesday, April 22, 2014 4:56 PM
  • You are fine, the person who made the suggestion is clueless.

    Tuesday, April 22, 2014 5:09 PM
  • In this case, I have to agree Soumen, although "clueless" may be a bit harsh.  Here is my short review in no particular order:

    • I am concerned about a column named "CreatedId" defined as varchar(20) since "ID" columns typically are associated with identity (or at least numeric) columns - but it is likely that you are storing a user's ID here which is character.  Are you certain that 20 characters is a reasonable limit? 
    • So let's address the var part of your varchar columns.  Do they really need to be variable?  Think about that before you say yes.  How variable (length-wise) are the values that will be stored?  If they are char(20), that completely avoids the original recommendation.  There is a tradeoff between allocating a specific amount of space for these columns in each row (obviously some is wasted) and allocating exactly what is needed when the values are set/changed (which can cause row movement within the table's pages).  60 bytes isn't much - and disk space is cheap (though disk throughput is important).  One can well argue that you are not gaining much by using varchar over char unless you anticipate huge numbers of rows. But then that raises a forest vs. trees argument.
    • I suggest that you always indicate whether a column is nullable or not rather than relying on a default connection setting.
    • A table should always have a clustered index unless there is a specific need to implement a heap.  Does this table have one? If not, why not?
    • Indexing in general should be addressed.
    • Should there be foreign keys or constraints for requestcode and responsecode?  Given the names there is likely a relationship between request and response.  What is it and is it properly modeled or constrained?
    • Should your quantity columns to constrained to positive values (i.e., > 0)?
    • Is it you intention to implement the recording of ResponseUpdateDate and ResponseUpdateId via a trigger?  If so, I would consider that part of your DDL to be reviewed alongside that of the table itself.
    • For consistency, you have "ResponseUpdateDate" but "LastUpdateDateTime" and "CreatedDateTime". 
    • Can KeyDate be in the past?  Should it be constrained? 
    • You have create and modified information for the row as a whole (presumably that is how it will be implemented).  Are you certain that you do not need to know specifically when the request columns were last modified.  Obviously usage of the table will drive this decision.  Similarly, do you not need to know specifically when the response columns were first set to non-null values?

    And one last comment.  A best practice is to explicitly specify the columns to be included in the resultset of a query (as opposed to using "*" for all columns). If one follows that practice then the argument that the columns of a table NEED to be in any particular order often fades away. That also means that you have no need to define a view that does nothing but select specific columns in a specific order from a specific table.  And technically speaking I cannot think of any practical reason to create such a view in the first place.  If you are still concerned, you can have a look at the link below (which will lead you to more links etc.).

    column order doesn't matter til it does

    Tuesday, April 22, 2014 6:18 PM
  • I meant to include the following from Paul Randal - who is much more knowledgable

    anatomy of a record

    Tuesday, April 22, 2014 6:21 PM
  • I couldn’t find a better word. He was talking about performance about varchar column in a table. Thanks.

    Tuesday, April 22, 2014 6:35 PM
  • Generally the fastest tables are with narrow columns and fixed size.

    Also generally the position of variable size columns should not make any performance difference.

    In your case using char(20) instead of varchar(20) will likely help with performance.

    Kalman Toth Database & OLAP Architect Free T-SQL Scripts
    New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012

    • Marked as answer by tracycai Tuesday, April 29, 2014 12:24 PM
    Wednesday, April 23, 2014 12:50 AM