Data Type length Declaration -> Storage and Performance

Answered Data Type length Declaration -> Storage and Performance

  • Friday, February 08, 2013 4:22 AM
     
     

    Hello Experts,

    Our developers keep declaring string columns in table as nvarchar(1000) instead of appropriate length for that column. For e.g., they declare zip to be nvarchar(1000). When I asked about it, they say SQL server will only use only required amount of bytes to store which I am aware of.

    But still, it just doesn't feel right to see all columns in table declared as nvarchar(1000) when they are not necessary. But, I don't know how declaring nvarchar(1000) would hurt us in terms of performance or storage or any other functionality?

    Can you please help?

All Replies

  • Friday, February 08, 2013 4:32 AM
     
     

    Storage is the same in both cases.

    But you should use correct data sizes to prevent bad data being entered.

  • Friday, February 08, 2013 4:35 AM
     
     

    You may read the below link:

    http://www.sql-server-performance.com/2007/datatypes/


    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

  • Friday, February 08, 2013 4:50 AM
     
     
    I think the answer should be No. See:
    nchar(n)

    Fixed-length Unicode character data of n characters. n must be a value from 1 through 4,000. Storage size is two times n bytes. The SQL-92 synonyms for nchar are national char and national character.

    nvarchar(n)

    Variable-length Unicode character data of n characters. n must be a value from 1 through 4,000. Storage size, in bytes, is two times the number of characters entered. The data entered can be 0 characters in length. The SQL-92 synonyms for nvarchar are national char varying and national character varying.
    Remarks

    When n is not specified in a data definition or variable declaration statement, the default length is 1. When n is not specified with the CAST function, the default length is 30.

    Use nchar when the data entries in a column are expected to be consistently close to the same size.

    Use nvarchar when the data entries in a column are expected to vary considerably in size.

    Many Thanks & Best Regards, Hua Min



  • Friday, February 08, 2013 4:52 AM
     
     

    When you store a value like 'Fred' in a name column defined as NVARCHAR(1000), SQL Server will take only the space of 4 characters. This is also true for VARCHAR. And an additional 2 bytes will take to store the length of each column value. Generally this will not affect performance.

    However the bytes taken by NVARCHAR and VARCHAR is different for storing the same value. Stiring a single character in NVARCHAR support unicode characters. So the value of 'Fred' takes 8 bytes in NVARCHAR and 4 bytes in VARCHAR. If your aplication is not supporting international characters, using VARCHAR will save space than using NVARCHAR.

    And also if you declare a FirstName and LastName as NVARCHAR(1000) or VARCHAR(1000), and suppose you need to create an index of FirstName and LastName then index creation will succed with a warning like 'The maximum length is 900 bytes....For some combination of large values, the insert/update operation will fail'. Though you may not insert a lengthy name that exceed 900 bytes, this is not a proper way.

    I suggest (in a design point of view) you to find appropriate data type and define the column with only required length. That will be easy to maintain and code.


    Krishnakumar S

  • Friday, February 08, 2013 4:58 AM
     
     

    You may also read the below one: (more specific to your question)

    http://blogs.adatis.co.uk/blogs/david/archive/2011/05/05/size-matters.aspx


    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

  • Friday, February 08, 2013 5:07 AM
     
     Answered

    The important thing is not the space used or the performance, it's the integrity of your data.  Just like any other constraint, using the proper datatype is important to maintaining your database integrity.  You should only allow 1000 characters in a column if actually having 1000 characters in that column is a valid entry.  If having 1,000 characters in your column is not valid, then choose the appropriate data length.

    But, yes, nvarchar will only store the used number of bytes + a two byte overhead.  So, an nvarchar(1000) column containing 25 characters will take the same amount of space as an nvarchar(40) column containing 25 characters.  But that doesn't mean that in all cases nvarchar(1000) is going to be free.  Consider what happens if you have a column that is to contain the US state code abbreviations (which are always 2 alphabetic characters and so can be stored in a char(2) column.  If you make your state column char(2) and store 'NY' in it, that takes 2 bytes.  If you make your state column nvarchar(1000) and store 'NY' in it, that will take 6 bytes (note, 3 times as much).  That's because nvarchar takes 2 bytes per character and char takes only one.  Also, every nvarchar column has a 2 byte overhead.

    Tom

  • Friday, February 08, 2013 6:21 PM
     
     

    Probably, a varchar (instead of nvarchar) would suffice. See the other posts on the difference is storage usage.

    See the other posts on maximum index key restrictions. What this means, is that if the column (that can use more than 900 bytes of storage) is indexed, and a malicious person can manipulate the data entry, he could force the failure of an insert (using a value that requires more than 900 bytes of storage) which may have all kinds of side effect within your system if they are not properly error handled.

    Also, the rest of your application may not expect large values and may not be able to handle them. For example, you may be printing the zip on an envelope. Let's say that some 100 characters fit on one line on the envelope. Then it is an interesting experiment what will happen if there happens to be a "zip" of 1000 characters!

    In general, the problem with not following best practices is, that it is hard to think of all the ways that abusing them could cause problems.


    Gert-Jan

  • Friday, February 08, 2013 7:15 PM
     
     

    also look at

    http://www.simple-talk.com/sql/performance/the-seven-sins-against-tsql-performance/

    It seems your programmers are laz (see i can't even finish typing lazy)!!

    Gert-Jan talks about reporting, what if the field is nor VARCHAR(1000) but only CHAR(1000) (Hey! I only have to type four characters rather than 7) then you will get a result like '20500                                                                    '. (notice all the trailing spaces)

    What if the data is some other type, like a date.  say  a birth date is VARCHAR(1000)? What does '3-5-1975' mean? March 5th or May 3rd? or what if the person is really old:

    '12.11.10' (ANSI: November, 10, 1912)

    '12.11.10' (German: November 12, 1910)

    '12/11/10' (Japan: November 10, 1912)

    '12/11/10' (USA: December, 11, 1910)

    http://www.sqlusa.com/bestpractices/datetimeconversion/