Word breaker for a new language
-
Thursday, March 08, 2012 8:04 AM
Hello all,
I understood that people go for a custom word breaker during the situations
1.When there is no word breaker for the specific language.
2.When the existing word breaker of specific language is not addressing project needs(special treatment of special characters).
Unforunately we are in a situation where the messages stored in our tables do not follow the rules of natural languages and the data is stored in multiple languages in the same column.
Logically i would like to call this as "Industrial language" similar to English,German,Spanish and Japanese languages.
Below are my questions.
1.It's possible to develop a custom word breaker only for specific language which has LCID .Is it so?
2. Assume if I want to develop a custom word breaker/stemmer for the above mentioned "Industrial language" which does n't have LCID. is it possible?
3.If yes, how the LCID for new languages are created/registered?
I hope i conveyed the problem properly.
Thanks & Regards
Samba
All Replies
-
Monday, March 12, 2012 8:35 AMModerator
Hi Samba,
As you known, each language has its LCID. There are languages whose word breakers are registered with SQL Server by default, which include English, German, Spanish and Japanese. Please pay attention to the result from the query below:SELECT * FROM sys.fulltext_languages
In addition, you can manually load licensed third-party word breakers for additional languages, such as Danish, Polish, and Turkish.
It seems that you cannot create a custom language mixture with many languages. For a column, it can only specify one type of language. I notice your another thread about XML data type column, I think the suggestion on that thread is appropriate in your scenario: XML word breaker and regular text word breaker.
Stephanie Lv
TechNet Community Support
-
Monday, March 12, 2012 11:26 AM
Hello Stephanie,
The answer provided for XML word breaker and regular text word breaker thread is very much clear.
The question in the current thread is can i develop a word breaker for the newly developed language (my project specific)?
I appreciate your time if you go through the original question once again
Thanks
Samba

