locked
When is NVARCHAR Really Necessary? RRS feed

  • Question

  • I have a fairly large database (~30GB) which backs a fairly complex application. As it is right now, this application only services english speaking countries.

    My task is to expand the applications reach to around 10 non-english speaking countries, as well as a few english speaking countries we don't currently target.

    We have several tables in the database that store user-visible text, and they all currently use varchar. I figured I would need to change these to nvarchar, but as I read about it in more detail I became less sure.

    Do I really need to switch to nvarchar if the countries I'm targeting don't require unicode to represent their character sets? Won't varchar handle single-byte character sets just fine?

    Below is the list of non-english speaking countries I'll need to target:

    • Austria
    • Belgium
    • Denmark
    • Finland
    • France
    • Germany
    • The Netherlands
    • Norway
    • Spain
    • Sweden
    Aren't all these countries single-byte character sets that will work just fine with varchar? I'd like to stick with varchar if possible, both from a work point of view, and from a performance/db size point of view. Some of the tables utilizing varchar are many GBs in size.

    Thanks.
    Wednesday, March 5, 2008 7:26 PM

Answers

  • You need NVarchar because all are in Europe which does not use ASCII by default, you not only need NVarchar you also need the correct collation and code page for each country or you will run into character conversion issues.  When you are getting character conversion all other issues are not relevant because you don't have an application.  SQL Server and other RDBMS use UCS-2 instead of UTF16 or UTF 8 which makes it complicated to deal with all accents and other language specific issues.  Run a search for collation and code pages in SQL Server BOL, if you still have more questions I can answer them I have seen 32 languages application running without issues in Asp.net. 

     

    Wednesday, March 5, 2008 8:04 PM

All replies

  • Hi RDM,

     

    Cpy and past the following link in your local BOL:

    ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/instsql9/html/11ce1a3d-8314-41a3-be5f-03db90bea61b.htm

     

    It gives you details about the collations and various page codes. Most the countries that you have listed can be supported with page code 1252.

     

    Now depending on your default collation, selected at the time of installation, you might not need to use nvarchar. If your system does not support page code 1252, you can modify your table collation using alter table statement. Further more,

    you can also specify the collation of your individual columns.

     

    In short you can set your collation at sever level, database level, table level and column level.

     

    If you need to convert data between diferent collation, you can use COLLATE clause.

     

    At last, you must use nvarchar or nchar when your data is unicode.

     

    I hope it helps.

     

    Best regards,

    Najm

     

    Wednesday, March 5, 2008 7:59 PM
  • You need NVarchar because all are in Europe which does not use ASCII by default, you not only need NVarchar you also need the correct collation and code page for each country or you will run into character conversion issues.  When you are getting character conversion all other issues are not relevant because you don't have an application.  SQL Server and other RDBMS use UCS-2 instead of UTF16 or UTF 8 which makes it complicated to deal with all accents and other language specific issues.  Run a search for collation and code pages in SQL Server BOL, if you still have more questions I can answer them I have seen 32 languages application running without issues in Asp.net. 

     

    Wednesday, March 5, 2008 8:04 PM
  • Just to clarify, I'm using SQL Server 2000 and my default collation is SQL_Latin1_General_CP1_CI_AS.

    You two seem to be giving conflicting answers.
    Wednesday, March 5, 2008 8:19 PM
  • I am a localization expert that is if you do what I tell you you may never get character conversion, the default collation is not relevant in SQL Server 2000 and 2005 because you can use the collation on database level, table level, column level in SQL Server 2000.  In SQL Server 2005 you can also use the collation in your DML(data manipulation langauge), that is in your queries.  So use Nvarchar and add relevant collation as needed.

     

    Wednesday, March 5, 2008 8:26 PM
  • Interesting.

    You mention character conversion as a potential problem when using varchar instead of nvarchar. Can you describe a scenario where this might happen?

    The reason I ask is because my application's use of this textual data is somewhat limited. One use is what ammounts to a large log table that records incoming and outgoing messages, the text of which is targeted or received from these various countries. We won't really be querying this text, for the most part.

    Also, can you provide any documentation, in addition to the BOL, that clearly describes the relationship between collation, code pages, and the storage of internationalized text?
    Wednesday, March 5, 2008 9:11 PM
  • I read the following article from BOL and it made more sense to me:

    http://msdn2.microsoft.com/en-us/library/aa214408(SQL.80,printer).aspx

    So I guess I have to go with nvarchar. Sigh...

    Any tips on going from varchar to nvarchar? Simply ALTER TABLE, or should I be worried? Smile
    Wednesday, March 5, 2008 9:28 PM
  • In 2000 it is easy use taskpad to make sure your MDF file is big enough for the multi bytes and right click and ALTER  TABLE Enterprise Manager will create a new table for you.  If everything is in one table you could use collation precedence, I think is it a predicate run a search for it in the BOL.  Here are some links as your requested.

     

    http://en.wikipedia.org/wiki/UTF-16

     

    http://www.unicode.org/faq/basic_q.html#25


    http://blogs.msdn.com/michkap/archive/2005/05/11/416552.aspx

     

    Wednesday, March 5, 2008 9:35 PM