Wednesday, August 29, 2012 8:06 AMIf a column contains the word LBP along with a number, then the search is not working unless wild card is used.
For instance, there is a row with column value ImageRunner LBP 3460 and there are couple of more rows which has column value ends with the number 3460. If we searche with the keyword 3460, other rows are retrieved but for ImageRunner LBP 3460. If wild card * is used then all the rows including ImageRunner LBP 3460 is getting retrieved. If one of the letter in LBP is removed then the row is getting retrieved without wild card search *
SQL Server 2008 is unable to search column containing the word LBP + number without wild card search.
We checked the system defined stop words but could not find anything related to LBP.
Tried by configuring transform noise words but no luck.
Any idea what might be the issue?
Thanks in advance for all your help.
Friday, August 31, 2012 2:07 AMLBP value, is this a image file? I've never heared searching this value in sql server.
Wednesday, September 05, 2012 8:31 AMThis is the name of the course that is stored in the DB.
Wednesday, September 12, 2012 4:04 PM
This has been problem with other words as well. This has been a problem even with string "RS DI 123 246" where we were able to search with RS alone, DI alone, and RS DI 123*, but not with "RS DI 123 246" altogether in the contains clause. There are a lot of such combinations where SQL contains itself is not returning the data. Do we have a list of such known issues in the SQL documented, similar to LBP, Bali etc.
Wednesday, September 12, 2012 7:58 PM
It is likely all related to the Language being used and what it was encoded to recognize. For example, one thing that LBP stands for is "Lebanese Pound", a measure of currency. In some languages, LBP 3460 interprets the LBP as a currency symbol (like $ in the US) and 3460 as the currency value. You can investigate with:
select * from sys.dm_fts_parser('"LBP 3480" ', 0,--Neutral Language = 0, American English=1033 NULL,--No stop words 0)--Accent Insensitive
Use your language to test your results.
Tuesday, September 25, 2012 11:21 AM
I am facing similar issue when having the full text indexed column values like 'Test 443 323' or '345 521 Test'. When search string is given as 'Test 443 323', 'Test', '443 323', '345 521', '345 521 Test' it does return a results correspondingly, but for search strings '443', '323', '345', '521' it does not return any test results. When checked the display terms created as part of full text catalog building in the table 'sys.dm_fts_index_keywords_by_document', I see only 'Test', '443 323', '345 521', 'Test 443 323', '345 521 Test' only exist. But there are no display terms created for individual numbers '443', '323', '345', '521'.
And also I have verified for some other strings like 'Test 42 547', '58 12 Test', search was working with all combinations as I see following display terms were available - 'Test', '42', '547', 'Test 42 547', '58', '12', '58 12 Test'.
So, can some one from MS Team confirm if there is an issue with word breaks when full text indexed string contains text in the format of 'Test 443 323' or '345 521 Test'.
Any inputs would be of great help.
Tuesday, September 25, 2012 12:25 PM
What language word breaker are you using? Neutral or something else? If your string looks like this, then you also need to remember that you are searching for a phrase, not independent words.
select * from sys.dm_fts_parser('"Test 443 323"', 1033,--Neutral Language = 0, American English=1033 NULL,--No stop words 0)--Accent Insensitive
When I run full text as American English (1033) I get the following strings:
When I run full text as Neutral Language I get the following strings:
Since many languages use a space between groups of numbers the way American English uses a comma, the 423 323 is viewed as a single number. My test results using the Neutral and American English languages produces the same results for "Test 42 547".
However, if this is not a phrase but you are looking for the individual tokens all appearing together then you could search for:
"Test & 443 & 323". Because this has been broken by the & or | operator, both languages parse this as:
So, as you can see, the language of the word breaker makes a difference.
Wednesday, September 26, 2012 8:08 AM
Thanks Russell for your reply and the explanation, We are using Neutral language word breaker –
Exactly I am looking for reason in difference of below query results -
I mean when the results for query 2 and 4 are similar why not for query 1 and 3.
Thanks in Advance.
Wednesday, September 26, 2012 12:27 PM
Because the Neutral language (and a lot of other languages as well) support the numeric pattern: [nn]n nnn = [nn]n,nnn
But the [n]n nn only has 2 numbers in the rightmost component, so this does not trigger the pattern described above.
Note: I am not saying that this is what you want, I am just saying that is what the word breaker is doing.
If there is another character between the numbers, I does not cause this behavior. See:
select * from sys.dm_fts_parser('"Test 443-323"', 0,--Neutral Language = 0, American English=1033 NULL,--No stop words 0)--Accent Insensitive
This will behave like American English treats "Test 443 323".
PS - So, I suppose you could determine when to throw a punctuation between number strings in order to change the wordbreaker's decision.
- Edited by Russell FieldsMVP Wednesday, September 26, 2012 12:28 PM PS added