none
Nvarchar data type vs varchar data type - SQL Server 2012

    Question

  • Hi,

    which are the differences between nvarchar data type and varchar data type?

    When using nvarchar and when using varchar?

    Thanks

    Tuesday, April 08, 2014 11:09 AM

Answers

  • ETL performance is an user requirement and moreover I'm using SSIS Connectors for Oracle by Attunity.

    Again, functional requirements come before business requirements.

    I can't speak of the performance in the context of Oralce, SSIS, or Attunity, but inside the engine, the following can be noted:

    *  One-byte varchar data takes up less space than nvarchar, and with more rows per page, you use less memory.

    *  Then again, with compression in force, ASCII data in nvarchar column takes up only byte from SQL 2008 R2 and on. (*Not* SQL 2008 R1!)

    * With an SQL collation, operations on varchar are considerably faster, at least with 8-bit code pages, because of the smaller character repetoire. This difference can be drastic for operations of the type LIKE '%abc%', when the entire string must be scanned. For plain equality, less so.

    * With a Unicode collation, varchar is somewhat slower, because all operations are carried out with Unicode, so there is a conversion forth and back.

    * But the most serious performance implication may be this:

        SELECT * FROM indexedvarchar = @nvarcharvalue

      This results in an implicit conversion of the table column. For a Windows collation the index will still be used, albeit not as efficient. For an SQL collation, the index is dead and you get a scan which can be very expensive. Such disasters are not uncommon since many frameworks defaults to Unicode strings.

    By now, you may have realised that the question you should have asked is which collation you should use. Answer: the one that fits the business requirements.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Tuesday, April 08, 2014 1:30 PM

All replies

  • Simple google search would give lots of reference:

    varchar [ ( n | max ) ]

    Variable-length, non-Unicode string data. n defines the string length and can be a value from 1 through 8,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size is the actual length of the data entered + 2 bytes. The ISO synonyms for varchar are char varying orcharacter varying.

    nvarchar [ ( n | max ) ]

    Variable-length Unicode string data. n defines the string length and can be a value from 1 through 4,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size, in bytes, is two times the actual length of data entered + 2 bytes. The ISO synonyms for nvarchar are national char varying and national character varying.

    Some of my search are below:

    http://sqlhints.com/2011/12/23/difference-between-varchar-and-nvarchar/

    Tuesday, April 08, 2014 11:16 AM
  • which are the differences between nvarchar data type and varchar data type?

    varchar uses one byte per character, for a total of 256 possible code points.  The first 128 characters are the standard ASCII characters.  The remaining 128 characters that can be stored are determined by the collation code page.

    nvarchar stores Unicode data and uses 2 bytes per character, for a total of 65,536 possible code points. 

    When using nvarchar and when using varchar?

    Unicode is a good choice when you need to store non-ASCII character or store characters from a mix of Latin and non-Latin languages.  Like all data types, one should choose the type most appropriate for the domain of data stored.  If storage size is a concern, consider row or page compression.


    Dan Guzman, SQL Server MVP, http://www.dbdelta.com

    Tuesday, April 08, 2014 11:21 AM
  • Hi Dan, thanks for your reply and about the point of view of ETL performances?

    Thanks

    Tuesday, April 08, 2014 11:43 AM
  • nvarchar is for Unicode data and permits you to mix all sorts of languages in the same colunm. The simple formula is that with nvarchar, every character takes up two bytes, although with the SC collations added in SQL 2012 this is not true.

    varchar data is always tied to a code page, and can only be used for languages fitted for that code page. Dan said that with varchar every charactgr takes one byte. This is not entirely accurate. It is true for code pages such as 1250, 1251 and 1252 (For Cyrilliac, Eastern European and Western European languages respectively.) However, in code pages for Far Eastern languages, non-ASCII characters take up two bytes. (But the limit in varchar(20) is always 20 bytes.)

    It is worth noting that there are not code pages fitting all languages. For instance, if you need to work with data in Telugu, you will need to use nvarchar, and you must use a collation with 90 or 100 in the name.

    I would generally recommend nvarchar for columns that is to hold names. Even if your business is not international today, it may be tomorrow, and changing after the fact can be a killer. However, for columns that is hold codes (status codes, currency codes, product codes), varchar may be a better choice, because such codes typically only use the English alphabet A to Z no matter the environment. (And I would also argue that a binary collation is the best choice for these.)

    Hi Dan, thanks for your reply and about the point of view of ETL performances?

    I would say that this is a case of the wrong question. You choose the data type from business requirements, not performance.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Tuesday, April 08, 2014 11:56 AM
  • Avoiding data type conversions will improve ETL performance.  But not all ETL is equal from a performance perspective so it depends on what your critical path and bottleneck is.


    Dan Guzman, SQL Server MVP, http://www.dbdelta.com

    Tuesday, April 08, 2014 12:00 PM
  • ETL performance is an user requirement and moreover I'm using SSIS Connectors for Oracle by Attunity.

    Thanks

    Tuesday, April 08, 2014 12:10 PM
  • You choose the data type from business requirements, not performance.

    +1.  I meant to include that in my response.


    Dan Guzman, SQL Server MVP, http://www.dbdelta.com

    Tuesday, April 08, 2014 12:18 PM
  • ETL performance is an user requirement and moreover I'm using SSIS Connectors for Oracle by Attunity.

    Again, functional requirements come before business requirements.

    I can't speak of the performance in the context of Oralce, SSIS, or Attunity, but inside the engine, the following can be noted:

    *  One-byte varchar data takes up less space than nvarchar, and with more rows per page, you use less memory.

    *  Then again, with compression in force, ASCII data in nvarchar column takes up only byte from SQL 2008 R2 and on. (*Not* SQL 2008 R1!)

    * With an SQL collation, operations on varchar are considerably faster, at least with 8-bit code pages, because of the smaller character repetoire. This difference can be drastic for operations of the type LIKE '%abc%', when the entire string must be scanned. For plain equality, less so.

    * With a Unicode collation, varchar is somewhat slower, because all operations are carried out with Unicode, so there is a conversion forth and back.

    * But the most serious performance implication may be this:

        SELECT * FROM indexedvarchar = @nvarcharvalue

      This results in an implicit conversion of the table column. For a Windows collation the index will still be used, albeit not as efficient. For an SQL collation, the index is dead and you get a scan which can be very expensive. Such disasters are not uncommon since many frameworks defaults to Unicode strings.

    By now, you may have realised that the question you should have asked is which collation you should use. Answer: the one that fits the business requirements.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Tuesday, April 08, 2014 1:30 PM
  • If all what you said is true, why Entity Framework default is nvarchar (and same for adding string type parameters in ADO.NET if explicit type is not specified)?

    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    My TechNet articles

    Tuesday, April 08, 2014 4:19 PM
  • If all what you said is true, why Entity Framework default is nvarchar (and same for adding string type parameters in ADO.NET if explicit type is not specified)?

    So does Java. And it is a reasonable choice. In .Net all strings are Unicode. So why default to something that could lead to data loss.

    And it is worth noting that SQL collations are officially retained for backwards compatibility. Nevermind that with the system locale set to US English, the default is an SQL collation. (But with all other system locales, the default collation is another.)


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Tuesday, April 08, 2014 9:59 PM
  • It is curious that using SSIS Connectors for Oracle I see string data for Oracle tables as varchar type, but using the Oracle provider for OLE DB I see string data as nvarchar type and I'm constrained to convert these data into varchar type for string data of my SQL Server tables.

    Thanks


    Note: I cannot to use "Unpropose an answer" and Site Feedback
    • Edited by pscorca Wednesday, April 09, 2014 7:26 AM
    Wednesday, April 09, 2014 7:25 AM
  • 8> It is curious that using SSIS Connectors for Oracle I see string data for Oracle tables as varchar type, but using the Oracle provider for OLE DB I see string data as nvarchar type and I'm constrained to convert these data into varchar type for string data of my SQL Server tables.

    That's as SSIS question and is way outside my realm. And this forum for that matter.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Wednesday, April 09, 2014 9:14 AM
  • You should use varchar (1 byte letters) to save space in a table, unless nvarchar (2 bytes letters) required.

    If you want to play it safe to accept foreign language strings, use nvarchar.

    The designer of DimProduct even used nvarchar for Color & EnglishProductName to be on the safe side.

    CREATE TABLE [dbo].[DimProduct](
    	[ProductKey] [int] IDENTITY(1,1) NOT NULL,
    	[ProductAlternateKey] [nvarchar](25) NULL,
    	[ProductSubcategoryKey] [int] NULL,
    	[WeightUnitMeasureCode] [nchar](3) NULL,
    	[SizeUnitMeasureCode] [nchar](3) NULL,
    	[EnglishProductName] [nvarchar](50) NOT NULL,
    	[SpanishProductName] [nvarchar](50) NOT NULL,
    	[FrenchProductName] [nvarchar](50) NOT NULL,
    	[StandardCost] [money] NULL,
    	[FinishedGoodsFlag] [bit] NOT NULL,
    	[Color] [nvarchar](15) NOT NULL,
    	[SafetyStockLevel] [smallint] NULL,
    	[ReorderPoint] [smallint] NULL,
    	[ListPrice] [money] NULL,
    	[Size] [nvarchar](50) NULL,
    	[SizeRange] [nvarchar](50) NULL,
    	[Weight] [float] NULL,
    	[DaysToManufacture] [int] NULL,
    	[ProductLine] [nchar](2) NULL,
    	[DealerPrice] [money] NULL,
    	[Class] [nchar](2) NULL,
    	[Style] [nchar](2) NULL,
    	[ModelName] [nvarchar](50) NULL,
    	[LargePhoto] [varbinary](max) NULL,
    	[EnglishDescription] [nvarchar](400) NULL,
    	[FrenchDescription] [nvarchar](400) NULL,
    	[ChineseDescription] [nvarchar](400) NULL,
    	[ArabicDescription] [nvarchar](400) NULL,
    	[HebrewDescription] [nvarchar](400) NULL,
    	[ThaiDescription] [nvarchar](400) NULL,
    	[GermanDescription] [nvarchar](400) NULL,
    	[JapaneseDescription] [nvarchar](400) NULL,
    	[TurkishDescription] [nvarchar](400) NULL,
    	[StartDate] [datetime] NULL,
    	[EndDate] [datetime] NULL,
    	[Status] [nvarchar](7) NULL)


    Kalman Toth Database & OLAP Architect Free T-SQL Scripts
    New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012

    Thursday, April 17, 2014 7:22 PM
  • ... even used nvarchar for Color & EnglishProductName ...

    Because sometimes an "English" name uses non-English characters.  E.g., Häagen-Dazs

    Thursday, April 17, 2014 8:08 PM
  • Well, yes, that's the point: play it safe.

    For that particular name, assuming USA default collation, varchar is sufficient:

    SELECT CONVERT (varchar, N'Häagen-Dazs')
    -- Häagen-Dazs

    But again a developer can't keep it trying for all possible foreign words used in English.


    Kalman Toth Database & OLAP Architect Free T-SQL Scripts
    New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012

    Thursday, April 17, 2014 8:17 PM