Answered by:
Excluded character from accent sensitivity

Question
-
We just get an error from our Analysis Service when processing a dimension that include accent. Analysis found a duplicate key even if we are using Collation Accent insensitive. I made some tests to figured out what happens and found out this:
Works as expected for e, é values
select 'e' collate SQL_Latin1_General_CP1_CI_AS union select 'é' SQL_Latin1_General_CP1_CS_AS
Return : 2 rows with the value 'e' and 'é'
select 'e' collate SQL_Latin1_General_CP1_CI_AI union select 'é' SQL_Latin1_General_CP1_CI_AI
Return : 1 row with the value 'e'
Abnormal behaviour with c, ç values
select 'c' collate SQL_Latin1_General_CP1_CI_AS union select 'ç' SQL_Latin1_General_CP1_CS_AS
Return : 2 rows with the value 'c', 'ç'
select 'c' collate SQL_Latin1_General_CP1_CI_AI union select 'ç' SQL_Latin1_General_CP1_CI_AI
Return : 2 rows with the value 'c', 'ç' even if we use Accent insensivity
I expect to see in this last query 1 row return with the value 'c' like we can see in the first example. I know that that it's not completely an accent but the goal of ignoring accent is to prevent duplicate from lazy people that don't put ponctuation on character.
My feeling is that behaviour should be modified to include all other punctuated character like this in the accent insensitivity model.
Someone agree, disagree? thanks for your feedback.
Thursday, June 1, 2017 2:53 PM
Answers
-
The SQL collations are legacy collations, and they have some inconsistent behaviour for varchar. This also returns two characters, despite the collation being case-insensitive.
select 'Š' collate SQL_Latin1_General_CP1_CI_AI union select 'š' SQL_Latin1_General_CP1_CI_AI
There are no issues for nvarchar where they behave like any other Windows collation, for instance:
select N'c' collate SQL_Latin1_General_CP1_CI_AI union select N'ç' SQL_Latin1_General_CP1_CI_AI
If you use a Windows collation, it does not matter whether you use varchar or nvarchar.
As for changing existing collations, that is not going to happen, no matter how wrong they are, as that would render all indexes based on them invalid.
- Proposed as answer by Lin LengMicrosoft contingent staff Wednesday, June 14, 2017 8:01 AM
- Marked as answer by Steve Gou Wednesday, June 14, 2017 11:13 AM
Thursday, June 1, 2017 9:26 PM
All replies
-
The SQL collations are legacy collations, and they have some inconsistent behaviour for varchar. This also returns two characters, despite the collation being case-insensitive.
select 'Š' collate SQL_Latin1_General_CP1_CI_AI union select 'š' SQL_Latin1_General_CP1_CI_AI
There are no issues for nvarchar where they behave like any other Windows collation, for instance:
select N'c' collate SQL_Latin1_General_CP1_CI_AI union select N'ç' SQL_Latin1_General_CP1_CI_AI
If you use a Windows collation, it does not matter whether you use varchar or nvarchar.
As for changing existing collations, that is not going to happen, no matter how wrong they are, as that would render all indexes based on them invalid.
- Proposed as answer by Lin LengMicrosoft contingent staff Wednesday, June 14, 2017 8:01 AM
- Marked as answer by Steve Gou Wednesday, June 14, 2017 11:13 AM
Thursday, June 1, 2017 9:26 PM -
Thanks Erland for your solution that works well for me by converting my varchar to nvarchar.
I understand that the best solution is to use a Windows Collation but this type of conversion is very painful.
Friday, June 2, 2017 3:32 PM