none
Full text search for Turkey language RRS feed

  • Question

  • Hi,

    The issue is related to full text indexing. Full text search in MS SQL is case-insensitive. There is a classic Turkey 'i' problem. Turkish language has an alphabet 'i' with a dot and another 'ı' without a dot. The upper case version of them are 'İ' with a dot and 'I' without a dot. The following fulltext search query gives different results if the Operating System is Turkish and different if it is English-
    select * from Table_Name f where CONTAINS (F.Column_Name, '"incilipinar"')


    In this case the upper case version of this word would be-İNCİLİPİNAR where I has a dot on top. But in English OS the above query is returning those rows where I doesnot have a dot and not the rows where İ has a dot on top.That is, it is hitting word -"INCILIPINAR" However, in Turkish OS the above query is returning those rows where İ has a dot on top. That is, it is hitting word-"İNCİLİPİNAR". In both cases third party word breakers for Turkish has been loaded.

    Why is this so?? The collation of SQL is same in both -"Turkish_CI_AS". How to fix this?

    Thank you

    Wednesday, August 1, 2012 11:12 AM

Answers

  • Hello,

    For FTS indexing always lower case of the words are used and this will cause the problem here. You can check it with the FTS parser:

    SELECT *
    FROM sys.dm_fts_parser(N' "incilipinar İNCİLİPİNAR" ', 1055, 0, 0)


    Olaf Helper
    * cogito ergo sum * errare humanum est * quote erat demonstrandum *
    Wenn ich denke, ist das ein Fehler und das beweise ich täglich
    Blog Xing


    Wednesday, August 1, 2012 11:34 AM
    Moderator