locked
Excluded character from accent sensitivity RRS feed

  • Question

  • We just get an error from our Analysis Service when processing a dimension that include accent. Analysis found a duplicate key even if we are using Collation Accent insensitive. I made some tests to figured out what happens and found out this:

    Works as expected for e, é values

    select 'e' collate SQL_Latin1_General_CP1_CI_AS union select 'é' SQL_Latin1_General_CP1_CS_AS

    Return : 2 rows with the value 'e' and 'é'

    select 'e' collate SQL_Latin1_General_CP1_CI_AI union select 'é' SQL_Latin1_General_CP1_CI_AI

    Return : 1 row with the value 'e'

    Abnormal behaviour with c, ç values

    select 'c' collate SQL_Latin1_General_CP1_CI_AS union select 'ç' SQL_Latin1_General_CP1_CS_AS

    Return : 2 rows with the value 'c', 'ç'

    select 'c' collate SQL_Latin1_General_CP1_CI_AI union select 'ç' SQL_Latin1_General_CP1_CI_AI

    Return : 2 rows with the value 'c', 'ç' even if we use Accent insensivity

    I expect to see in this last query 1 row return with the value 'c' like we can see in the first example. I know that that it's not completely an accent but the goal of ignoring accent is to prevent duplicate from lazy people that don't put ponctuation on character.

    My feeling is that behaviour should be modified to include all other punctuated character like this in the accent insensitivity model.  

    Someone agree, disagree? thanks for your feedback.

    Thursday, June 1, 2017 2:53 PM

Answers

  • The SQL collations are legacy collations, and they have some inconsistent behaviour for varchar. This also returns two characters, despite the collation being case-insensitive.

    select 'Š' collate SQL_Latin1_General_CP1_CI_AI union select 'š' SQL_Latin1_General_CP1_CI_AI

    There are no issues for nvarchar where they behave like any other Windows collation, for instance:

    select N'c' collate SQL_Latin1_General_CP1_CI_AI union select N'ç' SQL_Latin1_General_CP1_CI_AI

    If you use a Windows collation, it does not matter whether you use varchar or nvarchar.

    As for changing existing collations, that is not going to happen, no matter how wrong they are, as that would render all indexes based on them invalid.

    Thursday, June 1, 2017 9:26 PM

All replies

  • The SQL collations are legacy collations, and they have some inconsistent behaviour for varchar. This also returns two characters, despite the collation being case-insensitive.

    select 'Š' collate SQL_Latin1_General_CP1_CI_AI union select 'š' SQL_Latin1_General_CP1_CI_AI

    There are no issues for nvarchar where they behave like any other Windows collation, for instance:

    select N'c' collate SQL_Latin1_General_CP1_CI_AI union select N'ç' SQL_Latin1_General_CP1_CI_AI

    If you use a Windows collation, it does not matter whether you use varchar or nvarchar.

    As for changing existing collations, that is not going to happen, no matter how wrong they are, as that would render all indexes based on them invalid.

    Thursday, June 1, 2017 9:26 PM
  • Thanks Erland for your solution that works well for me by converting my varchar to nvarchar.

    I understand that the best solution is to use a Windows Collation but this type of conversion is very painful.

    Friday, June 2, 2017 3:32 PM