locked
persian character RRS feed

  • Question

  • Hi everyone.

    My project has some Persian fields.

    In Windows, Persian (Farsi) language has two letter 'ی'. One is 'ي' named Arabic ye with code 64A and another is 'ی' named Farsi ye with code 6CC.

    I set the Sql server collation to Persian and the type of these fields to nvarchar.

    Now when I use insert/select command in SS management studio like 

    select * from employee where name like '%ی%'

    it converts Farsi ye to Arabic ye before query execution and it makes problems and I must always use "like N'%ی%'" to correct.

    Is it a bug?

    How can I fix it




    msdn

    Saturday, July 7, 2012 4:31 PM

Answers

  • Dan, thanks for your replying

    I think it is a bug because it must convert to Farsi char Not to Arabic

    Which code page 1256 character (http://msdn.microsoft.com/en-us/library/cc195058.aspx) do you expect 'ی' to map to?  I don't think it's a bug if the character simply does not exist in the collation's code page.  I would not expect a new Windows code page to be introduced to address a problem already solved by Unicode.


    Dan Guzman, SQL Server MVP, http://weblogs.sqlteam.com/dang/

    • Marked as answer by Milad Asadi Monday, July 9, 2012 4:45 AM
    Monday, July 9, 2012 4:16 AM
    Answerer

All replies

  • select * from employee where name like '%ی%'

    it converts Farsi ye to Arabic ye before query execution and it makes problems and I must always use "like N'%ی%'" to correct.

    You must prefix Unicode literal strings with N.  Otherwise, the literal is interpreted as varchar instead of nvarchar.  Characters that cannot be mapped under the current collation will cause problems.


    Dan Guzman, SQL Server MVP, http://weblogs.sqlteam.com/dang/

    • Proposed as answer by Naomi NEditor Sunday, July 8, 2012 4:29 AM
    Saturday, July 7, 2012 5:57 PM
    Answerer
  • I think it must convert but to Persian not to Arabic (because of the collation)

    Also in my keyboard when I press this button Persian ye is shown, so in windows the default ye is Persian not Arabic


    msdn

    Saturday, July 7, 2012 7:28 PM
  • What is the name of the collation you are using?

    Dan Guzman, SQL Server MVP, http://weblogs.sqlteam.com/dang/

    Saturday, July 7, 2012 9:11 PM
    Answerer
  • my collation= Persian 100 CI

    msdn

    Sunday, July 8, 2012 3:52 AM
  • All Persian collations use code page 1256 (Windows Arabic):

    --Persian 100 CI collations
    SELECT
    	name
    	,COLLATIONPROPERTY(name, 'CodePage') AS CodePage
    FROM fn_helpcollations()
    WHERE name LIKE N'Persian_100_CI%';

    Code page 1256 does not have a mapping for the character 'ی' (Unicode 06CC) so the Farsi ye character is instead mapped to the alternate Arabic ye character 'ي' (Unicode 064A).  A question mark is used when no alternate character exists.  See http://msdn.microsoft.com/en-us/library/cc195058.aspx for a list of Windows code page 1256 characters and Unicode code point mappings.

    Since you are properly using nvarchar to store Persian characters, you just need to specify the N literal prefix to match the underlying Unicode data type.  This will avoid conversion problems because the collation code page is not involved with Unicode character mapping; the correct characters will be used and the collation will determine Unicode sorting and comparison rules (case insensitive).


    Dan Guzman, SQL Server MVP, http://weblogs.sqlteam.com/dang/


    Sunday, July 8, 2012 1:28 PM
    Answerer
  • Dan, thanks for your replying

    I think it is a bug because it must convert to Farsi char Not to Arabic

    Monday, July 9, 2012 3:36 AM
  • Dan, thanks for your replying

    I think it is a bug because it must convert to Farsi char Not to Arabic

    Which code page 1256 character (http://msdn.microsoft.com/en-us/library/cc195058.aspx) do you expect 'ی' to map to?  I don't think it's a bug if the character simply does not exist in the collation's code page.  I would not expect a new Windows code page to be introduced to address a problem already solved by Unicode.


    Dan Guzman, SQL Server MVP, http://weblogs.sqlteam.com/dang/

    • Marked as answer by Milad Asadi Monday, July 9, 2012 4:45 AM
    Monday, July 9, 2012 4:16 AM
    Answerer
  • Excuse me for jumping in :)

    But just to include this easier part of the original question...  should we say that there is a problem using plain strings ('   ')  vs. Unicode (N'  ')  because we are in a Unicode space?

    Wednesday, July 11, 2012 5:34 PM
  • But just to include this easier part of the original question...  should we say that there is a problem using plain strings ('   ')  vs. Unicode (N'  ')  because we are in a Unicode space?

    Well, it depends on what you exactly mean by the Unicode space since all characters are in the Unicode space.  For example, ASCII characters like "A" are in both the ASCII and Unicode space so the non-Unicode literal 'A' is the same character as the Unicode literal N'A' (albiet with different code points).  The problem with non-Unicode string literals happens only with non-ASCII characters that cannot be mapped according to the database default collation code page.

    If you are working with Unicode data types anyway, it is best to always specify Unicode string literals to avoid the code page ugliness entirely.


    Dan Guzman, SQL Server MVP, http://weblogs.sqlteam.com/dang/

    Thursday, July 12, 2012 12:07 AM
    Answerer