Custom Word breaker deliverables
-
27. února 2012 10:46
Hello everyone,
I assume we may need to develop a custom word breaker for english as part of our project work.
Can somebody explain what are the inputs and deliverables of a custom word breaker in detail?
Všechny reakce
-
27. února 2012 14:13
Here is where you should start. http://msdn.microsoft.com/en-us/library/windows/desktop/ff819112%28v=vs.85%29.aspxlooking for a book on SQL Server 2008 Administration? http://www.amazon.com/Microsoft-Server-2008-Management-Administration/dp/067233044X looking for a book on SQL Server 2008 Full-Text Search? http://www.amazon.com/Pro-Full-Text-Search-Server-2008/dp/1430215941
- Označen jako odpověď KJian_ 5. března 2012 6:38
-
8. března 2012 7:59
Hi Hilary,
I understood that people go for a custom word breaker during the situations
1.When there is no word breaker for the specific language.
2.When the existing word breaker of specific language is not addressing project needs(special treatment of special characters).
Unforunately we are in a situation where the messages stored in our tables do not follow the rules of natural languages and the data is stored in multiple languages in the same column.
Logically i would like to call this as "Industrial language" similar to English,German,Spanish and Japanese languages.
Below are my questions.
1.It's possible to develop a custom word breaker only for specific language which has LCID .Is it so?
2. Assume if I want to develop a custom word breaker/stemmer for the above mentioned "Industrial language" is it possible?
3.If yes, how the LCID for new languages are created/registered?
I hope i conveyed the problem properly.
Thanks & Regards
Samba
-
14. března 2012 12:06
Hello Hilary,
Can you please provide your views on this concern?
Thanks & Regards
Samba
-
14. března 2012 12:48
1) yes
2) the problem is how are you going to do language detection to apply language specific word breaker rules when using "industrial" or multilanguage/blended language content? So you are free to tag a language with an LCID and have a word breaker written to apply a word breaker for that language, but your word breaker is going to be very complex. Most people will break the content into different columns and apply different word breakers to these columns, ie one column for German, one for English, one for Japanese, etc.
3) you need to open up a support incident with Microsoft for guidance on how to do this.
looking for a book on SQL Server 2008 Administration? http://www.amazon.com/Microsoft-Server-2008-Management-Administration/dp/067233044X looking for a book on SQL Server 2008 Full-Text Search? http://www.amazon.com/Pro-Full-Text-Search-Server-2008/dp/1430215941
-
15. března 2012 5:27
Hi Hilary,
Thak you very much for your response.
Below is the sample format of language blended xml data stored in a single column.
Based on user specified LCID , search query the appropriate language content has to be retrieved.
What could be the best approach for implementing FTS with minimal efforts?
Regarding Answer2, You mean to say simply register the new custom breaker(for industrial language) with any one of the existing languages and select that language word breaker during FTI creation? Is my understanding correct?
-
19. března 2012 6:04
Hi Hilary,
Could you please advise on the above issue?
Thanks
Samba
-
20. března 2012 14:18
That will not work. Here is an example of how to set it. You need to use the xml:lang element.
<myXmlDoc> <docTitle> <docENUTitle xml:lang="en-us"> Yukon full-text search </docENUTitle> <docDEUTitle xml:lang="de"> Yukon full-text search (german equivalent) </docDEUTitle> </docTitle> … </myXmlDoc>
looking for a book on SQL Server 2008 Administration? http://www.amazon.com/Microsoft-Server-2008-Management-Administration/dp/067233044X looking for a book on SQL Server 2008 Full-Text Search? http://www.amazon.com/Pro-Full-Text-Search-Server-2008/dp/1430215941
-
21. března 2012 4:00
Hello Hilary,
Thanks for your response.
Assume i have the xml doc with xml:lang attribute for the respective languages.
1.Which word breaker i need to choose at the time of full text index creation?
2.How to write FTS query considering the search term can be from any language?
I really appreciate your time for answering these questions.
Thanks
Samba