none
Why are there so many(106) Hungarian collations? RRS feed

  • Question

  • I wonder how many Hungarians around the world really know which one to use!?

    SELECT name, description = LEFT(description, 40)
    FROM   sys.fn_HelpCollations() 
    WHERE  name LIKE '%HUN%' ORDER by name;
    /*
    Hungarian_100_BIN	Hungarian-100, binary sort
    Hungarian_100_BIN2	Hungarian-100, binary code point compari
    Hungarian_100_CI_AI	Hungarian-100, case-insensitive, accent-
    Hungarian_100_CI_AI_KS	Hungarian-100, case-insensitive, accent-
    Hungarian_100_CI_AI_KS_SC	Hungarian-100, case-insensitive, accent-
    Hungarian_100_CI_AI_KS_WS	Hungarian-100, case-insensitive, accent-
    Hungarian_100_CI_AI_KS_WS_SC	Hungarian-100, case-insensitive, accent-
    Hungarian_100_CI_AI_SC	Hungarian-100, case-insensitive, accent-
    Hungarian_100_CI_AI_WS	Hungarian-100, case-insensitive, accent-
    Hungarian_100_CI_AI_WS_SC	Hungarian-100, case-insensitive, accent-
    Hungarian_100_CI_AS	Hungarian-100, case-insensitive, accent-
    Hungarian_100_CI_AS_KS	Hungarian-100, case-insensitive, accent-
    Hungarian_100_CI_AS_KS_SC	Hungarian-100, case-insensitive, accent-
    Hungarian_100_CI_AS_KS_WS	Hungarian-100, case-insensitive, accent-
    Hungarian_100_CI_AS_KS_WS_SC	Hungarian-100, case-insensitive, accent-
    Hungarian_100_CI_AS_SC	Hungarian-100, case-insensitive, accent-
    Hungarian_100_CI_AS_WS	Hungarian-100, case-insensitive, accent-
    Hungarian_100_CI_AS_WS_SC	Hungarian-100, case-insensitive, accent-
    Hungarian_100_CS_AI	Hungarian-100, case-sensitive, accent-in
    Hungarian_100_CS_AI_KS	Hungarian-100, case-sensitive, accent-in
    Hungarian_100_CS_AI_KS_SC	Hungarian-100, case-sensitive, accent-in
    Hungarian_100_CS_AI_KS_WS	Hungarian-100, case-sensitive, accent-in
    Hungarian_100_CS_AI_KS_WS_SC	Hungarian-100, case-sensitive, accent-in
    Hungarian_100_CS_AI_SC	Hungarian-100, case-sensitive, accent-in
    Hungarian_100_CS_AI_WS	Hungarian-100, case-sensitive, accent-in
    Hungarian_100_CS_AI_WS_SC	Hungarian-100, case-sensitive, accent-in
    Hungarian_100_CS_AS	Hungarian-100, case-sensitive, accent-se
    Hungarian_100_CS_AS_KS	Hungarian-100, case-sensitive, accent-se
    Hungarian_100_CS_AS_KS_SC	Hungarian-100, case-sensitive, accent-se
    Hungarian_100_CS_AS_KS_WS	Hungarian-100, case-sensitive, accent-se
    Hungarian_100_CS_AS_KS_WS_SC	Hungarian-100, case-sensitive, accent-se
    Hungarian_100_CS_AS_SC	Hungarian-100, case-sensitive, accent-se
    Hungarian_100_CS_AS_WS	Hungarian-100, case-sensitive, accent-se
    Hungarian_100_CS_AS_WS_SC	Hungarian-100, case-sensitive, accent-se
    Hungarian_BIN	Hungarian, binary sort
    Hungarian_BIN2	Hungarian, binary code point comparison 
    Hungarian_CI_AI	Hungarian, case-insensitive, accent-inse
    Hungarian_CI_AI_KS	Hungarian, case-insensitive, accent-inse
    Hungarian_CI_AI_KS_WS	Hungarian, case-insensitive, accent-inse
    Hungarian_CI_AI_WS	Hungarian, case-insensitive, accent-inse
    Hungarian_CI_AS	Hungarian, case-insensitive, accent-sens
    Hungarian_CI_AS_KS	Hungarian, case-insensitive, accent-sens
    Hungarian_CI_AS_KS_WS	Hungarian, case-insensitive, accent-sens
    Hungarian_CI_AS_WS	Hungarian, case-insensitive, accent-sens
    Hungarian_CS_AI	Hungarian, case-sensitive, accent-insens
    Hungarian_CS_AI_KS	Hungarian, case-sensitive, accent-insens
    Hungarian_CS_AI_KS_WS	Hungarian, case-sensitive, accent-insens
    Hungarian_CS_AI_WS	Hungarian, case-sensitive, accent-insens
    Hungarian_CS_AS	Hungarian, case-sensitive, accent-sensit
    Hungarian_CS_AS_KS	Hungarian, case-sensitive, accent-sensit
    Hungarian_CS_AS_KS_WS	Hungarian, case-sensitive, accent-sensit
    Hungarian_CS_AS_WS	Hungarian, case-sensitive, accent-sensit
    Hungarian_Technical_100_BIN	Hungarian-Technical-100, binary sort
    Hungarian_Technical_100_BIN2	Hungarian-Technical-100, binary code poi
    Hungarian_Technical_100_CI_AI	Hungarian-Technical-100, case-insensitiv
    Hungarian_Technical_100_CI_AI_KS	Hungarian-Technical-100, case-insensitiv
    Hungarian_Technical_100_CI_AI_KS_SC	Hungarian-Technical-100, case-insensitiv
    Hungarian_Technical_100_CI_AI_KS_WS	Hungarian-Technical-100, case-insensitiv
    Hungarian_Technical_100_CI_AI_KS_WS_SC	Hungarian-Technical-100, case-insensitiv
    Hungarian_Technical_100_CI_AI_SC	Hungarian-Technical-100, case-insensitiv
    Hungarian_Technical_100_CI_AI_WS	Hungarian-Technical-100, case-insensitiv
    Hungarian_Technical_100_CI_AI_WS_SC	Hungarian-Technical-100, case-insensitiv
    Hungarian_Technical_100_CI_AS	Hungarian-Technical-100, case-insensitiv
    Hungarian_Technical_100_CI_AS_KS	Hungarian-Technical-100, case-insensitiv
    Hungarian_Technical_100_CI_AS_KS_SC	Hungarian-Technical-100, case-insensitiv
    Hungarian_Technical_100_CI_AS_KS_WS	Hungarian-Technical-100, case-insensitiv
    Hungarian_Technical_100_CI_AS_KS_WS_SC	Hungarian-Technical-100, case-insensitiv
    Hungarian_Technical_100_CI_AS_SC	Hungarian-Technical-100, case-insensitiv
    Hungarian_Technical_100_CI_AS_WS	Hungarian-Technical-100, case-insensitiv
    Hungarian_Technical_100_CI_AS_WS_SC	Hungarian-Technical-100, case-insensitiv
    Hungarian_Technical_100_CS_AI	Hungarian-Technical-100, case-sensitive,
    Hungarian_Technical_100_CS_AI_KS	Hungarian-Technical-100, case-sensitive,
    Hungarian_Technical_100_CS_AI_KS_SC	Hungarian-Technical-100, case-sensitive,
    Hungarian_Technical_100_CS_AI_KS_WS	Hungarian-Technical-100, case-sensitive,
    Hungarian_Technical_100_CS_AI_KS_WS_SC	Hungarian-Technical-100, case-sensitive,
    Hungarian_Technical_100_CS_AI_SC	Hungarian-Technical-100, case-sensitive,
    Hungarian_Technical_100_CS_AI_WS	Hungarian-Technical-100, case-sensitive,
    Hungarian_Technical_100_CS_AI_WS_SC	Hungarian-Technical-100, case-sensitive,
    Hungarian_Technical_100_CS_AS	Hungarian-Technical-100, case-sensitive,
    Hungarian_Technical_100_CS_AS_KS	Hungarian-Technical-100, case-sensitive,
    Hungarian_Technical_100_CS_AS_KS_SC	Hungarian-Technical-100, case-sensitive,
    Hungarian_Technical_100_CS_AS_KS_WS	Hungarian-Technical-100, case-sensitive,
    Hungarian_Technical_100_CS_AS_KS_WS_SC	Hungarian-Technical-100, case-sensitive,
    Hungarian_Technical_100_CS_AS_SC	Hungarian-Technical-100, case-sensitive,
    Hungarian_Technical_100_CS_AS_WS	Hungarian-Technical-100, case-sensitive,
    Hungarian_Technical_100_CS_AS_WS_SC	Hungarian-Technical-100, case-sensitive,
    Hungarian_Technical_BIN	Hungarian-Technical, binary sort
    Hungarian_Technical_BIN2	Hungarian-Technical, binary code point c
    Hungarian_Technical_CI_AI	Hungarian-Technical, case-insensitive, a
    Hungarian_Technical_CI_AI_KS	Hungarian-Technical, case-insensitive, a
    Hungarian_Technical_CI_AI_KS_WS	Hungarian-Technical, case-insensitive, a
    Hungarian_Technical_CI_AI_WS	Hungarian-Technical, case-insensitive, a
    Hungarian_Technical_CI_AS	Hungarian-Technical, case-insensitive, a
    Hungarian_Technical_CI_AS_KS	Hungarian-Technical, case-insensitive, a
    Hungarian_Technical_CI_AS_KS_WS	Hungarian-Technical, case-insensitive, a
    Hungarian_Technical_CI_AS_WS	Hungarian-Technical, case-insensitive, a
    Hungarian_Technical_CS_AI	Hungarian-Technical, case-sensitive, acc
    Hungarian_Technical_CS_AI_KS	Hungarian-Technical, case-sensitive, acc
    Hungarian_Technical_CS_AI_KS_WS	Hungarian-Technical, case-sensitive, acc
    Hungarian_Technical_CS_AI_WS	Hungarian-Technical, case-sensitive, acc
    Hungarian_Technical_CS_AS	Hungarian-Technical, case-sensitive, acc
    Hungarian_Technical_CS_AS_KS	Hungarian-Technical, case-sensitive, acc
    Hungarian_Technical_CS_AS_KS_WS	Hungarian-Technical, case-sensitive, acc
    Hungarian_Technical_CS_AS_WS	Hungarian-Technical, case-sensitive, acc
    SQL_Hungarian_CP1250_CI_AS	Hungarian, case-insensitive, accent-sens
    SQL_Hungarian_CP1250_CS_AS	Hungarian, case-sensitive, accent-sensit
    */


    Kalman Toth SQL 2008 GRAND SLAM
    New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2012


    Tuesday, December 4, 2012 12:08 AM
    Moderator

Answers

  • For any given collation designator there are in SQL 2008 18 different collations: two binary collations, and then there are all 16 combinations of case-, accent-, kana- and width-sensitity. In SQL 2012 this explodes even more, with the addition of surrogate-aware collations.

    It is correct that kana- and width-sensitivity relate to East Asian scripts, and particularly Japanese. However, all Windows collations cover the full Unicode set, and even if your base language is Hungarian, you may still need to work with Japanese text.

    But it is true that for Hungarian there are unusually many collations, since there is both Hungarian and Hungarian_Technical. I cannot answer why it is so, but there is probably a legacy reason for it.

    What I can is that a couple of years ago, I ran a crazy performance test to test all collations in SQL 2008. I had an idle server available, and the test ran for 8-9 days. And Hungarian_Technical came out as the slowest as I recall.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Wednesday, December 5, 2012 8:44 AM
  • Since Japanese and Hungarian use different code points, there is not really conflict between them. It should be easier to co-sort Hungarian and Japanese than Hungarian and Swedish. (We have different ideas on how to sort Ü and Ö.) Or at least Japanese written in Hiragana/Katakani. Since they share the Kanji characters with Chinese, there may be some conflicts there.

    But you are right, for 99.999% of the uses cases for a Western collation, it is of no importance whether the collation is kana- or width-sensitive or not.

    I think it would have been less confusing, if the Windows collations had declared something like:

       col nvarchar(200) Finnish_Swedish(CI, AS)

    Then you would not see the same crazy list that you do today. A syntax like this could also permit as to say:

      col nvarchar(200) database_default(CS)

    That is, I would the default collation family, but I want the column to be case-sensitive.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Wednesday, December 5, 2012 9:45 AM
  • Thanks Erland. It is simply shocking. In kindergarten you learn that A Á O Ó and Ö ? is the order in the alphabet. That is how the phonebooks, dictionaries, & directories organized. You cannot possibly "sort" a document mixing A Á at will.

    I don't have access to an Hungarian phonebook, so I can't speak for that part. But as I said, my dictionary co-sorts A/Á etc. And I should clarify that this is an Hungarian work. It is not an idea of some Swedish publisher.

    Just because there is an alphabet does not mean that there is a simple order. Or for that matter that the alphabet stays the same. When I was a kid, I learnt that the Swedish alphabet has 28 letters. A couple of years back I entered a classroom for lower-grade pupils and they had pictures on the wall for the alphabet. I noticed that there were 29 pictures: there was a specific one for W. But at the time dictionaries always co-sorted V and W. Since then, the Swedish Academy has published a more recent version of their dictionary and which has separate entries for V and W. So maybe now, we have 29 letters, and the next version of SQL Server will offers collations like Modern_Swedish.

    Note also that dictionaries and phonebooks may have different conventions. This is the case in German, and I doubt that the Swedish phonebook will have separate entries for V and W. Then again, phonebooks are becoming a thing of the past...


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Thursday, December 6, 2012 9:45 AM
  • Ah, thanks Kalman for making the effort to sort this out. That was an interesting tidbit to know!

    The rules for languages change over the time, and there tends to be dissent from at least part of the population when new rules are introduced. It typically does not help if the new rules are imposed by a dictatorial regime.

    The set of collations in SQL Servers offers a few more examples: Simplified vs. Traditional Chinese, Modern vs. Traditional Spanish. And Latin1_General vs. German_Phonebook.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Thursday, December 6, 2012 11:25 AM

All replies

  • Isn't this a mountain out of a molehill?

    Try '%french%', '%chinese%'

    A regular language has about 52 due to the various combination of flags, doubled for the "100" (sql server 2008) specific tweaks from 2005. You may also see the SQL_ variations for collations that existed since SQL 7/2000.

    The only thing I see different in the Hungarian language is a "technical" variant (doubling from 50+ to 100+) - a linguist or Hungarian may be better suited to answer that.  As you can see, the Chinese language has even more variants, predominantly the Simplified and Traditional branches, plus various stroke orderings.

    Tuesday, December 4, 2012 2:22 AM
  • The hungarian language might be complicated and hence so much of collations included :) . just curious, why worry about collations ? is there any issues for you with having so many hungary collations?

    Please mark as helpful and propose as answer if you find this as correct!!! Thanks, Rakesh.

    Tuesday, December 4, 2012 5:29 AM
  • Just curious if anyone in entire Hungary knows which one to pick out of the 106? Hungarian is a Latin alphabet with some accented letters like ő.Thanks.

    Related thread:

    http://social.msdn.microsoft.com/Forums/en/transactsql/thread/ffeb685e-fca3-4071-a5f2-3ea699c4fa01


    Kalman Toth SQL 2008 GRAND SLAM
    New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2012



    Tuesday, December 4, 2012 7:20 AM
    Moderator
  • I do hope that there is at least one person in the world who understands collations....

    "Hungarian_100_CI_AI_KS Hungarian-100, case-insensitive, accent-insensitive, kanatype-sensitive,..."

    But Hungarian is an extended Latin alphabet, it has nothing to do with Japanase Kana alphabet.


    Kalman Toth SQL 2008 GRAND SLAM
    New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2012

    Wednesday, December 5, 2012 8:12 AM
    Moderator
  • For any given collation designator there are in SQL 2008 18 different collations: two binary collations, and then there are all 16 combinations of case-, accent-, kana- and width-sensitity. In SQL 2012 this explodes even more, with the addition of surrogate-aware collations.

    It is correct that kana- and width-sensitivity relate to East Asian scripts, and particularly Japanese. However, all Windows collations cover the full Unicode set, and even if your base language is Hungarian, you may still need to work with Japanese text.

    But it is true that for Hungarian there are unusually many collations, since there is both Hungarian and Hungarian_Technical. I cannot answer why it is so, but there is probably a legacy reason for it.

    What I can is that a couple of years ago, I ran a crazy performance test to test all collations in SQL 2008. I had an idle server available, and the test ran for 8-9 days. And Hungarian_Technical came out as the slowest as I recall.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Wednesday, December 5, 2012 8:44 AM
  • Thanks Erland,

    Just in case we mix Hungarian with Japanese, we need a special sort for that? Or this is just legacy, it has nothing to do with the practical world?

    DECLARE @HUNJAP nvarchar(128) = N'Pásztó無いありませんMátrakeresztes';
    SELECT @HUNJAP;  -- Pásztó無いありませんMátrakeresztes
    If you twist my arms, I admit to it, you can mix Hungarian with Japanase on  pure combinatorics basis like above.  So I would just sort them binary. I just can't imagine somebody wants a special sort for these kind of mixed western-oriental language text?

    Kalman Toth SQL 2008 GRAND SLAM
    New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2012

    Wednesday, December 5, 2012 8:55 AM
    Moderator
  • Since Japanese and Hungarian use different code points, there is not really conflict between them. It should be easier to co-sort Hungarian and Japanese than Hungarian and Swedish. (We have different ideas on how to sort Ü and Ö.) Or at least Japanese written in Hiragana/Katakani. Since they share the Kanji characters with Chinese, there may be some conflicts there.

    But you are right, for 99.999% of the uses cases for a Western collation, it is of no importance whether the collation is kana- or width-sensitive or not.

    I think it would have been less confusing, if the Windows collations had declared something like:

       col nvarchar(200) Finnish_Swedish(CI, AS)

    Then you would not see the same crazy list that you do today. A syntax like this could also permit as to say:

      col nvarchar(200) database_default(CS)

    That is, I would the default collation family, but I want the column to be case-sensitive.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Wednesday, December 5, 2012 9:45 AM
  • Thanks Erland. 2 more questions:

    1. If a company ( doing business in EU, Ukraine & Russia) is installing SQL Server in Hungary which collation supposed to pick out of the 106?

    2. Why many of "AS" collations cannot sort the alphabet the right way?  A, Á, B, C, D...

    DECLARE @HU TABLE (Word nvarchar(30));
    INSERT @HU VALUES (N'Őa'),(N'Óz'),(N'Öa'),(N'Oz'), (N'Áa'),(N'Az') ; 
    -- Expected sort order A Á O Ó Ö Ő 
    SELECT * FROM @HU ORDER BY Word COLLATE  SQL_Hungarian_CP1250_CI_AS;
    SELECT * FROM @HU ORDER BY Word COLLATE  HUNGARIAN_CI_AS;
    
    /*
    Word
    Áa
    Az
    Oz
    Óz
    Öa
    Őa
    */
    This was the issue with the referenced thread above.

    Kalman Toth SQL 2008 GRAND SLAM
    New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2012

    Wednesday, December 5, 2012 12:27 PM
    Moderator
  • 1. If a company ( doing business in EU, Ukraine & Russia) is installing SQL Server in Hungary which collation supposed to pick out of the 106?

    If you need to handle multi-national data, life gets difficult. The most
    pragmatic solution is probably to choose a collation that meets the
    experience of the majority of the users and hope that speakers of other
    languages don't get to upset. Or sort in the client, based on the user's
    regional settings.

    Of course, collation is more than sorting. Searching, for instance, and
    here different rules can also cause problem. Overall, this is a non-trivial problem.

    2. Why many of "AS" collations cannot sort the alphabet the right way?  A, > Á, B, C, D...

    Apparently there is more than one way to sort Hungarian. I did not notice first the thread you referred to, but I've added an answer to that thread. Short story: my dictionary co-sorts the accented vowels, the Hungarian_Technical collations do not.

    By the way, acute accent in Hungarian signifies that the vowel is long. A convention shared with Slovak and Czech.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Wednesday, December 5, 2012 11:00 PM
  • >Apparently there is more than one way to sort Hungarian

    I am not aware of that. Of course I am not aware of what is technical Hungarian either. Yes Á (like the doctor: say aaah) is long and the 2nd letter after A. This is some sort of bug:

    DECLARE @HU TABLE (Word nvarchar(30));
    INSERT @HU VALUES (N'Á'),(N'Áa'),(N'Az'), (N'Áaa'),(N'Azz'),(N'A') ; 
    -- Expected sort order A A A  Á  Á  Á
    SELECT * FROM @HU ORDER BY Word COLLATE  SQL_Hungarian_CP1250_CI_AS;
    SELECT * FROM @HU ORDER BY Word COLLATE  HUNGARIAN_CI_AS;
    
    /*
    A
    Á
    Áa
    Áaa
    Az
    Azz
    */

    This is the Hungarian alphabet (kindergarten stuff):

    A Á B C Cs D Dz Dzs E É F G Gy H I Í J K L Ly M N
    Ny O Ó Ö Ő P (Q) R S Sz T Ty U Ú Ü Ű V (W) (X) (Y) Z Zs

    Kalman Toth SQL 2008 GRAND SLAM
    New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2012


    Thursday, December 6, 2012 1:19 AM
    Moderator
  • You should have read more of that Wikipedia article you linked to. It says:

    While the characters with diacritical marks are considered separate letters, vowels that differ only in length are treated the same when ordering words. Therefore, for example, the pairs O/Ó and Ö/Ő are not distinguished in ordering, but Ö follows O.

    The polygraphic consonant signs are treated as single letters.

    See more at http://en.wikipedia.org/wiki/Hungarian_alphabet#Alphabetical_ordering_.28collation.29

    As for more ways to sort a language, this is not unheard of. German has two possibilities for how to sort ÄÖÜ.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    Thursday, December 6, 2012 8:04 AM
  • >Therefore, for example, the pairs O/Ó and Ö/Ő are not distinguished in ordering, but Ö follows O.

    Thanks Erland. It is simply shocking. In kindergarten you learn that A Á O Ó and Ö Ő is the order in the alphabet. That is how the phonebooks, dictionaries, & directories organized. You cannot possibly "sort" a document mixing A Á at will.

    We just have to remember to use technical collations like HUNGARIAN_TECHNICAL_CI_AS:

    DECLARE @HU TABLE (Word nvarchar(30));
    INSERT @HU VALUES (N'Á'),(N'Áa'),(N'Az'), (N'Áaa'),(N'Azz'),(N'A') ; 
    -- Expected sort order A A A  Á  Á  Á
    SELECT * FROM @HU ORDER BY Word COLLATE  HUNGARIAN_TECHNICAL_CI_AS;
    
    /*
    A
    Az
    Azz
    Á
    Áa
    Áaa
    */


    Kalman Toth SQL 2008 GRAND SLAM
    New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2012

    Thursday, December 6, 2012 8:42 AM
    Moderator
  • Thanks Erland. It is simply shocking. In kindergarten you learn that A Á O Ó and Ö ? is the order in the alphabet. That is how the phonebooks, dictionaries, & directories organized. You cannot possibly "sort" a document mixing A Á at will.

    I don't have access to an Hungarian phonebook, so I can't speak for that part. But as I said, my dictionary co-sorts A/Á etc. And I should clarify that this is an Hungarian work. It is not an idea of some Swedish publisher.

    Just because there is an alphabet does not mean that there is a simple order. Or for that matter that the alphabet stays the same. When I was a kid, I learnt that the Swedish alphabet has 28 letters. A couple of years back I entered a classroom for lower-grade pupils and they had pictures on the wall for the alphabet. I noticed that there were 29 pictures: there was a specific one for W. But at the time dictionaries always co-sorted V and W. Since then, the Swedish Academy has published a more recent version of their dictionary and which has separate entries for V and W. So maybe now, we have 29 letters, and the next version of SQL Server will offers collations like Modern_Swedish.

    Note also that dictionaries and phonebooks may have different conventions. This is the case in German, and I doubt that the Swedish phonebook will have separate entries for V and W. Then again, phonebooks are becoming a thing of the past...


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Thursday, December 6, 2012 9:45 AM
  • Thanks Erland. I just talk to a physicist in Budapest about the topic. As it turns out before Communism the sorting was alphabetic. One "invention" of Communism was the mumbo-jumbo sorting (Ácy Ácz Ács) like below.

    DECLARE @HU TABLE (Word nvarchar(30));
    INSERT @HU VALUES (N'Á'),(N'Áa'),(N'Az'),(N'Ácy'),(N'Azz'),(N'A'), (N'Ácz'),(N'Ács') ; 
    -- Expected sort order A A A  Á  Á  Á
    SELECT * FROM @HU ORDER BY Word COLLATE  HUNGARIAN_CI_AS;
    
    /*
    Word
    A
    Á
    Áa
    Ácy
    Ácz
    Ács
    Az
    Azz
    */

    It appears to me that computer people did not like the Communist way of sorting and introduced the "technical" collations, thus doubling the number of Hungarian collations (106!).

    Let's just remember: to be safe use HUNGARIAN_TECHNICAL_CI_AS or HUNGARIAN_TECHNICAL_CS_AS .


    Kalman Toth SQL 2008 GRAND SLAM
    New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2012


    Thursday, December 6, 2012 11:03 AM
    Moderator
  • Ah, thanks Kalman for making the effort to sort this out. That was an interesting tidbit to know!

    The rules for languages change over the time, and there tends to be dissent from at least part of the population when new rules are introduced. It typically does not help if the new rules are imposed by a dictatorial regime.

    The set of collations in SQL Servers offers a few more examples: Simplified vs. Traditional Chinese, Modern vs. Traditional Spanish. And Latin1_General vs. German_Phonebook.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Thursday, December 6, 2012 11:25 AM