none
On SQL_Latin1_General_CP1_CI_AS SQL Server 2008 Default Collation

Answers

  • Hi

    If it  is  nothing specific I choose default Windows Collation


    Best Regards, Uri Dimant SQL Server MVP http://dimantdatabasesolutions.blogspot.com/ http://sqlblog.com/blogs/uri_dimant/
    Tuesday, August 17, 2010 7:36 AM
    Answerer
  • In my opinion, it is not a good choice.  I believe you should be using the SQL collations only for backward compatibility.  That's Microsoft's recommendation as well.  And I guess Microsoft made the US English default to be a SQL collation because that's the what SQL Setup did in prior releases.  I wish they hadn't done that.  Or, at least, made it clearer that in new installations that weren't going to depend on the way a previous version of SQL worked, like for example, if you were doing a brand new install on a new server of 2008R2, but were planning to move a database from another server that depended on the server having a SQL collation.

    A big reason to not use the SQL collations is that SQL can then have a different sorting order than your client application which will be using a Windows collation.  Also, if you use a SQL collation, you can get different sort order and comparison results if you store data in a varchar column or a nvarchar column.  Even if the data that you are storing fits perfectly well in 8 bits.  (That's because unicode data is always compared and sorted with a windows collation, no matter what you specify the collation is.)  For example:

     CREATE TABLE #TEMP (name varchar(80) Collate SQL_Latin1_General_CP1_CI_AS)
     INSERT INTO #TEMP values ('Malteser Schloßschule')
     INSERT INTO #TEMP values ('Malteser Schlossschule')
    
     CREATE TABLE #TEMP1 (name nvarchar(80) Collate SQL_Latin1_General_CP1_CI_AS)
     INSERT INTO #TEMP1 values ('Malteser Schloßschule')
     INSERT INTO #TEMP1 values ('Malteser Schlossschule')
    
    -- Now the data is there correctly in both tables
     SELECT * FROM #TEMP
    -- Results :- Malteser Schloßschule
    --        Malteser Schlossschule
     SELECT * FROM #TEMP1
    -- Results :- Malteser Schloßschule
    --        Malteser Schlossschule
    
    -- but it sorts and compares differently in the two tables 
     SELECT * FROM #TEMP WHERE name ='Malteser Schlossschule'
    -- Results :- 'Malteser Schlossschule'
    
     SELECT * FROM #TEMP1 WHERE name ='Malteser Schlossschule'
    -- Results :- Malteser Schloßschule
    --        Malteser Schlossschule
    Drop Table #TEMP
    Drop TABLE #TEMP1
    go
    
    -- Above problem does not exist with windows collation
     CREATE TABLE #TEMP (name varchar(80) Collate Latin1_General_CI_AS)
     INSERT INTO #TEMP values ('Malteser Schloßschule')
     INSERT INTO #TEMP values ('Malteser Schlossschule')
    
     CREATE TABLE #TEMP1 (name nvarchar(80) Collate Latin1_General_CI_AS)
     INSERT INTO #TEMP1 values ('Malteser Schloßschule')
     INSERT INTO #TEMP1 values ('Malteser Schlossschule')
    
    -- Now the data is there correctly in both tables
     SELECT * FROM #TEMP
    -- Results :- Malteser Schloßschule
    --        Malteser Schlossschule
     SELECT * FROM #TEMP1
    -- Results :- Malteser Schloßschule
    --        Malteser Schlossschule
    
    -- But now it sorts and compares correctly
     SELECT * FROM #TEMP WHERE name ='Malteser Schlossschule'
    -- Results :- Malteser Schloßschule
    --        Malteser Schlossschule
    
     SELECT * FROM #TEMP1 WHERE name ='Malteser Schlossschule'
    -- Results :- Malteser Schloßschule
    --        Malteser Schlossschule
    Drop Table #TEMP
    Drop TABLE #TEMP1
    
    Tom

    Tuesday, August 17, 2010 7:40 AM
  • Here is a good reason: the most prevalent collation is SQL_Latin1_General_CP1_CI_AS  in SQL Server installations. So if you have to work with data from other installations you lessen the likelyhood of encountering the nightmare of mixed collations (which I had due to "inheritence" and external data).


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM

    I certainly agree.  Mixing collations within the same instance of SQL Server is a source of virtually endless problems unless the entire system was carefully written to exist with multiple collations - and doing that is also a big hassle.  So I would agree, if you expect to get databases from other sources and include them in this instance, then in the US by far the most common collation is SQL_Latin1_General_CP1_CI_AS and it makes sense to pick that. 

    My experience has been that that hasn't happened much to me.  If we got databases from different installations it mostly was either a database that was mostly self-contained, in which case we could create a new instance of SQL server that had all the options including collation set the way this new database worked best, or, it contained data that needed to be integrated into existing databases and/or systems.   But in that case, we usually needed a DTL process anyway, so that offered an opportunity to change the collation.

    Also, I did most of my work in the US, but with a product that was sold primarily to the international market.  And it's only in the US that a SQL collation is the default.  Almost everywhere else in the world, the default is a windows collation.

    I guess the best answer to your question is the standard database answer:  "It depends".

    Tom

    Tuesday, August 17, 2010 3:50 PM
  • No, SQL_Latin1_General_CP1_CI_AS is not a good choice for a default collation. True, if you only work with varchar data, and never use anything resemblent of Unicode, you can better performance. A statment like:

    SELECT * FROM tbl WHERE col LIKE '%abc%'

    will perform about seven times faster with an SQL collation than with a
    Windows collation.

    But there is also a devilish trap with SQL collation, which is a lot easier to walk into that you might believe. Consider:

    SELECT * FROM tbl WHERE indexedvarcharcol = @nvarcharval

    Since varchar has lower precedence than nvarchar, you will get an implicit conversion. With a Windows collations, varchar is just a subset of nvarchar, so the index is still usable, althouught can expect an overhead of a factor 2 or 3.

    But with an SQL collations there are differences in the rules, so the index cannot be seeked and the cost can be a factor of 100 or 1000 or more depending on the size of the table.

    Only a fool would do that! No, as I said, it is easier than you think. Many client tools work with Unicode by default. Java programmers are
    most prone to be victims to this it seems, but .Net also offers a lot
    of occassions.

    It's ironic, that here is an issue related to internationalisation where US customers are the most likely victims. As Tom said, English (United States) is the only locale that gives you an SQL collation by default. There is all reason to override it.

    I have two Connect items in this area, one calling for a new default for US English, and one calling for no defaults at all.
    https://connect.microsoft.com/SQLServer/feedback/details/357974/make-latin1-general-100-ci-as-the-default-collation-for-us-english
    https://connect.microsoft.com/SQLServer/feedback/details/357973/do-not-provide-a-default-server-collation-at-setup


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Links for SQL Server Books Online:
    SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
    SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
    SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
    Tuesday, August 17, 2010 7:50 PM
  • Do you think SQL_Latin1_General_CP1_CI_AS  is the obvious choice for such an undertaking in America? Within an IT shop? Within an Enterprise?

    I think once a collation precedent has been established, it’s generally better to stick with that collation going forward unless one has plans to remediate the legacy installations.  Microsoft probably chose to make the SQL_Latin1_General_CP1_CI_AS backward compatible collation the default to help existing US customers avoid the mixed collation ugliness.  Right or wrong, I would continue to use SQL_Latin1_General_CP1_CI_AS for new instances here in the US if existing instances with that collation exist.  That avoids subsequent collation problems as a result of database consolidation.  In a new or completely isolated environment, I would certainly choose a Windows collation, though. 

    PS: Can you fix your link in your post above? It gives invalid page. The following link works:

    http://weblogs.sqlteam.com/dang/archive/2009/07/26/Collation-____-Part-1.aspx

    It looks the 4-letter word gets scrubbed during posting:-)  Try:

    SELECT CAST(0x687474703A2F2F7765626C6F67732E73716C7465616D2E636F6D2F64616E672F617263686976652F323030392F30372F32362F436F6C6C6174696F6E2D48656C6C2D506172742D312E61737078AS varchar(MAX))
    

    Dan Guzman, SQL Server MVP, http://weblogs.sqlteam.com/dang/
    Wednesday, August 18, 2010 1:12 AM
  • > Here is the point though: you, the SQL industry expert, has the wisdom and courage to pick any collation you like. However, if I were a*junior DBA in Boston*, I would play it safe: pick what Microsoft is "telling me," the default, SQL_Latin1_General_CP1_CI_AS collation.

    Which is why I've submitted suggestions on Connect to change the defaults. Or have no defaults at all.

    Dan said: Microsoft probably chose to make the SQL_Latin1_General_CP1_CI_AS backward compatible collation the default to help existing US customers avoid the mixed collation ugliness.

    However, many people elsewhere face the problem, not the least in other English-speaking countries. Sometimes the server's locale has been changed to "English (Australia)" and sometimes it has not. Over where I live, most people realise they want something related to Swedish, but I see servers with SQL_Latin1_General_CP1_CI_AS here as well.

    As for what are "extended ASCII", you need Latin-2 to cover all characters in Hungarian in an 8-bit character set. The double-accented characters are missing from Latin-1.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Links for SQL Server Books Online:
    SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
    SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
    SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
    Wednesday, August 18, 2010 7:25 AM
  • */ -- ASCII function does not work for the Hungarian UNICODE characters SELECT ASCII ('O' ), ASCII ('Ö' ), ASCII (N'Ő' ) -- 79 214 79

    Try this:

     

    SELECT ASCII('O'), ASCII('Ö'), ASCII(N'Ő' COLLATE Hungarian_CS_AS)
    


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Links for SQL Server Books Online:
    SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
    SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
    SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
    Wednesday, August 18, 2010 10:13 PM
  • So, ASCII implicitly translates the UNICODE character  'Ő' into 'O' under the default SQL_Latin1_General_CP1_CI_AS collation, and extended ASCII if the appropriate collation is specified.

    It's not really the ASCII function as such. ASCII accepts varchar input, so N'Ő'is converted to varchar, in which case a fallback character is used. (And this is 'O', although 'Ö' would be a lot better choice.)

    This happens in several other cases. Try this:

    SELECT N'Ő', cast(N'Ő' as varchar), 'Ő'
    

     

     


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Links for SQL Server Books Online:
    SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
    SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
    SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
    Thursday, August 19, 2010 10:06 PM

All replies

  • Hi

    If it  is  nothing specific I choose default Windows Collation


    Best Regards, Uri Dimant SQL Server MVP http://dimantdatabasesolutions.blogspot.com/ http://sqlblog.com/blogs/uri_dimant/
    Tuesday, August 17, 2010 7:36 AM
    Answerer
  • In my opinion, it is not a good choice.  I believe you should be using the SQL collations only for backward compatibility.  That's Microsoft's recommendation as well.  And I guess Microsoft made the US English default to be a SQL collation because that's the what SQL Setup did in prior releases.  I wish they hadn't done that.  Or, at least, made it clearer that in new installations that weren't going to depend on the way a previous version of SQL worked, like for example, if you were doing a brand new install on a new server of 2008R2, but were planning to move a database from another server that depended on the server having a SQL collation.

    A big reason to not use the SQL collations is that SQL can then have a different sorting order than your client application which will be using a Windows collation.  Also, if you use a SQL collation, you can get different sort order and comparison results if you store data in a varchar column or a nvarchar column.  Even if the data that you are storing fits perfectly well in 8 bits.  (That's because unicode data is always compared and sorted with a windows collation, no matter what you specify the collation is.)  For example:

     CREATE TABLE #TEMP (name varchar(80) Collate SQL_Latin1_General_CP1_CI_AS)
     INSERT INTO #TEMP values ('Malteser Schloßschule')
     INSERT INTO #TEMP values ('Malteser Schlossschule')
    
     CREATE TABLE #TEMP1 (name nvarchar(80) Collate SQL_Latin1_General_CP1_CI_AS)
     INSERT INTO #TEMP1 values ('Malteser Schloßschule')
     INSERT INTO #TEMP1 values ('Malteser Schlossschule')
    
    -- Now the data is there correctly in both tables
     SELECT * FROM #TEMP
    -- Results :- Malteser Schloßschule
    --        Malteser Schlossschule
     SELECT * FROM #TEMP1
    -- Results :- Malteser Schloßschule
    --        Malteser Schlossschule
    
    -- but it sorts and compares differently in the two tables 
     SELECT * FROM #TEMP WHERE name ='Malteser Schlossschule'
    -- Results :- 'Malteser Schlossschule'
    
     SELECT * FROM #TEMP1 WHERE name ='Malteser Schlossschule'
    -- Results :- Malteser Schloßschule
    --        Malteser Schlossschule
    Drop Table #TEMP
    Drop TABLE #TEMP1
    go
    
    -- Above problem does not exist with windows collation
     CREATE TABLE #TEMP (name varchar(80) Collate Latin1_General_CI_AS)
     INSERT INTO #TEMP values ('Malteser Schloßschule')
     INSERT INTO #TEMP values ('Malteser Schlossschule')
    
     CREATE TABLE #TEMP1 (name nvarchar(80) Collate Latin1_General_CI_AS)
     INSERT INTO #TEMP1 values ('Malteser Schloßschule')
     INSERT INTO #TEMP1 values ('Malteser Schlossschule')
    
    -- Now the data is there correctly in both tables
     SELECT * FROM #TEMP
    -- Results :- Malteser Schloßschule
    --        Malteser Schlossschule
     SELECT * FROM #TEMP1
    -- Results :- Malteser Schloßschule
    --        Malteser Schlossschule
    
    -- But now it sorts and compares correctly
     SELECT * FROM #TEMP WHERE name ='Malteser Schlossschule'
    -- Results :- Malteser Schloßschule
    --        Malteser Schlossschule
    
     SELECT * FROM #TEMP1 WHERE name ='Malteser Schlossschule'
    -- Results :- Malteser Schloßschule
    --        Malteser Schlossschule
    Drop Table #TEMP
    Drop TABLE #TEMP1
    
    Tom

    Tuesday, August 17, 2010 7:40 AM
  • A big reason to not use the SQL collations is that SQL can then have a different sorting order than your client application which will be using a Windows collation.  Also, if you use a SQL collation, you can get different sort order and comparison results if you store data in a varchar column or a nvarchar column.  Even if the data that you are storing fits perfectly well in 8 bits.  


    Tom,

    Thanks for your valuable reasoning. Nonetheless, I went through 18 years of SQL Server experience always picking the American default SQL collation as "safe" choice. Survived. Here is a good reason: the most prevalent collation is SQL_Latin1_General_CP1_CI_AS  in SQL Server installations. So if you have to work with data from other installations you lessen the likelyhood of encountering the nightmare of mixed collations(which I had due to "inheritence" and external data).


    Kalman Toth SQL SERVER & BI TRAINING
    New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2016



    Tuesday, August 17, 2010 11:47 AM
    Moderator
  • Is that the default offered by SQL Server setup?

    It depends on the OS language.  In the UK, the Windows collation is the default but the SQL collation is the default in the US.  I blogged a bit about collations (http://weblogs.sqlteam.com/dang/archive/2009/07/26/Collation-____-Part-1.aspx).  The bottom line is that it generally best to standardize on a single collation to avoid subsequent pain.


    Dan Guzman, SQL Server MVP, http://weblogs.sqlteam.com/dang/
    Tuesday, August 17, 2010 12:23 PM
  • Here is a good reason: the most prevalent collation is SQL_Latin1_General_CP1_CI_AS  in SQL Server installations. So if you have to work with data from other installations you lessen the likelyhood of encountering the nightmare of mixed collations (which I had due to "inheritence" and external data).


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM

    I certainly agree.  Mixing collations within the same instance of SQL Server is a source of virtually endless problems unless the entire system was carefully written to exist with multiple collations - and doing that is also a big hassle.  So I would agree, if you expect to get databases from other sources and include them in this instance, then in the US by far the most common collation is SQL_Latin1_General_CP1_CI_AS and it makes sense to pick that. 

    My experience has been that that hasn't happened much to me.  If we got databases from different installations it mostly was either a database that was mostly self-contained, in which case we could create a new instance of SQL server that had all the options including collation set the way this new database worked best, or, it contained data that needed to be integrated into existing databases and/or systems.   But in that case, we usually needed a DTL process anyway, so that offered an opportunity to change the collation.

    Also, I did most of my work in the US, but with a product that was sold primarily to the international market.  And it's only in the US that a SQL collation is the default.  Almost everywhere else in the world, the default is a windows collation.

    I guess the best answer to your question is the standard database answer:  "It depends".

    Tom

    Tuesday, August 17, 2010 3:50 PM
  • No, SQL_Latin1_General_CP1_CI_AS is not a good choice for a default collation. True, if you only work with varchar data, and never use anything resemblent of Unicode, you can better performance. A statment like:

    SELECT * FROM tbl WHERE col LIKE '%abc%'

    will perform about seven times faster with an SQL collation than with a
    Windows collation.

    But there is also a devilish trap with SQL collation, which is a lot easier to walk into that you might believe. Consider:

    SELECT * FROM tbl WHERE indexedvarcharcol = @nvarcharval

    Since varchar has lower precedence than nvarchar, you will get an implicit conversion. With a Windows collations, varchar is just a subset of nvarchar, so the index is still usable, althouught can expect an overhead of a factor 2 or 3.

    But with an SQL collations there are differences in the rules, so the index cannot be seeked and the cost can be a factor of 100 or 1000 or more depending on the size of the table.

    Only a fool would do that! No, as I said, it is easier than you think. Many client tools work with Unicode by default. Java programmers are
    most prone to be victims to this it seems, but .Net also offers a lot
    of occassions.

    It's ironic, that here is an issue related to internationalisation where US customers are the most likely victims. As Tom said, English (United States) is the only locale that gives you an SQL collation by default. There is all reason to override it.

    I have two Connect items in this area, one calling for a new default for US English, and one calling for no defaults at all.
    https://connect.microsoft.com/SQLServer/feedback/details/357974/make-latin1-general-100-ci-as-the-default-collation-for-us-english
    https://connect.microsoft.com/SQLServer/feedback/details/357973/do-not-provide-a-default-server-collation-at-setup


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Links for SQL Server Books Online:
    SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
    SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
    SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
    Tuesday, August 17, 2010 7:50 PM
  • Do you think SQL_Latin1_General_CP1_CI_AS  is the obvious choice for such an undertaking in America? Within an IT shop? Within an Enterprise?

    I think once a collation precedent has been established, it’s generally better to stick with that collation going forward unless one has plans to remediate the legacy installations.  Microsoft probably chose to make the SQL_Latin1_General_CP1_CI_AS backward compatible collation the default to help existing US customers avoid the mixed collation ugliness.  Right or wrong, I would continue to use SQL_Latin1_General_CP1_CI_AS for new instances here in the US if existing instances with that collation exist.  That avoids subsequent collation problems as a result of database consolidation.  In a new or completely isolated environment, I would certainly choose a Windows collation, though. 

    PS: Can you fix your link in your post above? It gives invalid page. The following link works:

    http://weblogs.sqlteam.com/dang/archive/2009/07/26/Collation-____-Part-1.aspx

    It looks the 4-letter word gets scrubbed during posting:-)  Try:

    SELECT CAST(0x687474703A2F2F7765626C6F67732E73716C7465616D2E636F6D2F64616E672F617263686976652F323030392F30372F32362F436F6C6C6174696F6E2D48656C6C2D506172742D312E61737078AS varchar(MAX))
    

    Dan Guzman, SQL Server MVP, http://weblogs.sqlteam.com/dang/
    Wednesday, August 18, 2010 1:12 AM
  • No, SQL_Latin1_General_CP1_CI_AS is not a good choice for a default collation. True, if you only work with varchar data, and never use anything resemblent of Unicode, you can better performance.


    Erland,

    Here is the point though: you, the SQL industry expert, have the wisdom and courage to pick any collation you like. However, if I were a junior DBA in Boston, I would play it safe: pick what Microsoft is "telling me," the default, SQL_Latin1_General_CP1_CI_AS collation.

    COLLATION is a very difficult issue. For example, it took years for me to realize that the Hungarian alphabet has:

     o ASCII Latin letters like 'O'

     o ASCII extended characters like 'Ö'

     o UNICODE characters like 'Ő'  (only 2, I thought they were extended ASCII)

     Guaranteed to trip 95% of the developers.


    Kalman Toth, SQL Server & Business Intelligence Training; Beginner Database Design & SQL Programming Using Microsoft SQL Server 2016
    Wednesday, August 18, 2010 4:36 AM
    Moderator
  • > Here is the point though: you, the SQL industry expert, has the wisdom and courage to pick any collation you like. However, if I were a*junior DBA in Boston*, I would play it safe: pick what Microsoft is "telling me," the default, SQL_Latin1_General_CP1_CI_AS collation.

    Which is why I've submitted suggestions on Connect to change the defaults. Or have no defaults at all.

    Dan said: Microsoft probably chose to make the SQL_Latin1_General_CP1_CI_AS backward compatible collation the default to help existing US customers avoid the mixed collation ugliness.

    However, many people elsewhere face the problem, not the least in other English-speaking countries. Sometimes the server's locale has been changed to "English (Australia)" and sometimes it has not. Over where I live, most people realise they want something related to Swedish, but I see servers with SQL_Latin1_General_CP1_CI_AS here as well.

    As for what are "extended ASCII", you need Latin-2 to cover all characters in Hungarian in an 8-bit character set. The double-accented characters are missing from Latin-1.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Links for SQL Server Books Online:
    SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
    SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
    SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
    Wednesday, August 18, 2010 7:25 AM
  • As for what are "extended ASCII", you need Latin-2 to cover all characters in Hungarian in an 8-bit character set. The double-accented characters are missing from Latin-1.

    I see the 2 Hungarian UNICODE characters on the Latin-2 (1 byte) chart: http://nl.ijs.si/gnusl/cee/charset.html .

    However, in T-SQL (and other software), they still require UNICODE (2-bytes) handling, or the alternative workaround of implicit conversion to Latin letters without accents (see ASCII function):

    DECLARE @HuUni nchar(2) = N'ŐŰ', @HuUni1byte char(2)= 'ŐŰ'
    SELECT [Unicode]=@HuUni, ConvertedToAscii=@HuUni1byte
    /*
    Unicode	ConvertedToAscii
    ŐŰ	OU
    */
    -- ASCII function does not work for the Hungarian UNICODE characters
    SELECT ASCII('O'), ASCII('Ö'), ASCII(N'Ő')
    -- 79	214	79
    

    Kalman Toth, SQL Server & Business Intelligence Training; Beginner Database Design & SQL Programming Using Microsoft SQL Server 2016
    Wednesday, August 18, 2010 8:18 AM
    Moderator
  • */ -- ASCII function does not work for the Hungarian UNICODE characters SELECT ASCII ('O' ), ASCII ('Ö' ), ASCII (N'Ő' ) -- 79 214 79

    Try this:

     

    SELECT ASCII('O'), ASCII('Ö'), ASCII(N'Ő' COLLATE Hungarian_CS_AS)
    


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Links for SQL Server Books Online:
    SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
    SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
    SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
    Wednesday, August 18, 2010 10:13 PM
  • Thanks Erland. Very interesting! And confusing at the same time...

    So, ASCII implicitly translates the UNICODE character  'Ő' into 'O' under the default SQL_Latin1_General_CP1_CI_AS collation, and extended ASCII if the appropriate collation is specified.

    SELECT ASCII('O'), ASCII('Ö'), ASCII(N'Ő' COLLATE Hungarian_CS_AS)
    -- 79	 214	 213
    

    Kalman Toth
    New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2016

    Wednesday, August 18, 2010 10:33 PM
    Moderator
  • So, ASCII implicitly translates the UNICODE character  'Ő' into 'O' under the default SQL_Latin1_General_CP1_CI_AS collation, and extended ASCII if the appropriate collation is specified.

    It's not really the ASCII function as such. ASCII accepts varchar input, so N'Ő'is converted to varchar, in which case a fallback character is used. (And this is 'O', although 'Ö' would be a lot better choice.)

    This happens in several other cases. Try this:

    SELECT N'Ő', cast(N'Ő' as varchar), 'Ő'
    

     

     


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Links for SQL Server Books Online:
    SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
    SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
    SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
    Thursday, August 19, 2010 10:06 PM
  • I am active in this forums for the last 2 months.............

    But till now i didn't observe this thread......... :)

     

    This is the first question in sql server quiz 2010 by erland... :)  I gave a reply there 1 hr back............

    Microsoft closed this connect issue.... Their reasoning is worst.....  Windows collation column is using seek, while dealing with 16-bit column. But SQL collation is using scan, while dealing with same 16-bit column. Reason is different. Not about bit length... :)

    https://connect.microsoft.com/SQLServer/feedback/details/229661/compare-with-unicode-constant-doesnt-use-index-if-collation-different-than-default?wa=wsignin1.0

     

     

    Between, is it right to reply this thread now?????????

    Saturday, October 02, 2010 3:13 PM
  • So, ASCII implicitly translates the UNICODE character  'Ő' into 'O' under the default SQL_Latin1_General_CP1_CI_AS collation, and extended ASCII if the appropriate collation is specified.

    SELECT ASCII('O'), ASCII('Ö'), ASCII(N'Ő' COLLATE Hungarian_CS_AS)
    -- 79	 214	 213
    

    As Erland mentioned already, it is not the ASCII() function that does any translating, it is simply a matter of data type conversion. Meaning, since the ASCII() function accepts a parameter of type VARCHAR, the data being passed in is converted to VARCHAR if it is not already that type. This would happen any time you pass NVARCHAR data into any VARCHAR parameter, or even into a VARCHAR variable or into a VARCHAR field in a table.

    Now, what any character gets translated into is really a matter of the Code Page associated with the collation of the destination field, or of the database default collation when using a) functions and not using the COLLATE clause, or b) local variables.

    In order to translate a Unicode character into a similar Extended ASCII character, the destination Code Page has to first have a similar character. The Latin1 collations (as well as others) use Code Page 1252. It seems that Code Page 1252 does not have that character, so it maps to a regular letter "O". But run the following to find the Hungarian Code Page:

    SELECT COLLATIONPROPERTY('Hungarian_CS_AS', 'CodePage');

    -- 1250

    Once you know the Code Page, then you can find the mapping by going here:

    ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/

    The "best fit" mappings are what Windows collations (in SQL Server and .NET and I guess Windows in general) use to map from Unicode into a Code Page. Great, but what are we looking for? Run the following:

    SELECT UNICODE(N'Ő'), CONVERT(BINARY(2), UNICODE(N'Ő'));

    -- 336 0x0150

    As expected, the decimal / INT Code Point is above the 255 limit of Extened ASCII. And we needed the hex / BINARY version of it to look it up in the mapping file. There is a mapping for Code Page 1250 which is what the Hungarian collation is using. Looking at the mapping, we see that it shows:

    0x0150	0xd5	;Latin Capital Letter O With Double Acute

    Convert that back to INT:

    SELECT CONVERT(INT, 0xD5);

    -- 213

    And we get the same "213" that you got when running "ASCII(N'Ő' COLLATE Hungarian_CS_AS)"


    Sunday, December 13, 2015 3:23 AM
  • I'm facing the issue of selecting the right Collation for a new Data Warehouse on a SQL Server 2016 instance. In spite of all the considerations applied on this thread, nowadays Microsoft has clarified that "SQL Server collations are still supported for backward compatibility, but should not be used for new development work".

    So for new environment the best direction is to use a Windows collation as Latin1_General_CI_AS instead of SQL_Latin1_General_CP1_CI_AS.

    Related links:

    https://docs.microsoft.com/en-us/sql/t-sql/statements/sql-server-collation-name-transact-sql

    Bye, Simone.

    Friday, July 07, 2017 1:51 PM
  • So for new environment the best direction is to use a Windows collation as Latin1_General_CI_AS instead of SQL_Latin1_General_CP1_CI_AS.

    I agree a Windows collation is the way to go for new development work in a new isolated database. In the case of new development that must happily coexist with legacy SQL collation databases and objects, it would be better to either stick with the SQL collation precedent or get rid of the SQL collations entirely. 

    Note that SQL collations are not deprecated (at least yet). I expect them to be around until sometime after semicolon statement terminators become mandatory (which is already formally deprecated).


    Dan Guzman, Data Platform MVP, http://www.dbdelta.com

    Friday, July 07, 2017 3:45 PM