What am I missing regarding the Full-Text Indexing of my fields?
-
4 พฤษภาคม 2555 13:58
I am using SQL Server 2005 with a Full-Text Catalog containing two indexed tables. Each table has two fields that are a part of the index. One field in both tables is a Keywords field. The format of data entered into the field is like this...
certificate; certificate of insurance; certificates of insurance, coi
The part of my query containing the keywords goes like this...CONTAINSTABLE(MemberPage, (PageTitle, Keywords), @Keywords)
and the variable is like this...
SET @Keywords = '"*' + @Keywords + '*"'
When the user enters a search for "certificates of insurance" nothing is pulled back. I realize "of" is a noise word. What is it that I am missing regarding how the Keywords is indexed based on its contents that would cause no results to be returned? I thought the indexing would automatically skip the word "of" in the Keywords field making it look for something like "*certificates * insurance*"?
ตอบทั้งหมด
-
4 พฤษภาคม 2555 19:21
A few things to know. First, for an *string the leading * is ignored since only suffix wildcards are allowed. (So you can search for 'car*' but not '*car', meaning that you can find 'cars' but not 'scar'.)
In addition, noise word behavior is affected by:
exec sp_configure 'transform noise words', 1 RECONFIGURE GO
Also, you did not show your @Keywords string, but there are differences between how you might express your search criteria. For example, see the test code below for different results.
CREATE TABLE dbo.KeyWordLookup (ID INT NOT NULL, ContentDetail NVARCHAR(100)); CREATE UNIQUE INDEX ui_id ON dbo.KeyWordLookup(ID); CREATE FULLTEXT CATALOG ft AS DEFAULT; CREATE FULLTEXT INDEX ON dbo.KeyWordLookup(ContentDetail LANGUAGE 'English') KEY INDEX ui_id; INSERT INTO dbo.KeyWordLookup VALUES(1,'certificate of insurance'); INSERT INTO dbo.KeyWordLookup VALUES(2,'certificates of insurance'); INSERT INTO dbo.KeyWordLookup VALUES(3,'certificate insurance'); INSERT INTO dbo.KeyWordLookup VALUES(4,'certificate is insurance'); INSERT INTO dbo.KeyWordLookup VALUES(5,'certificates is insurance'); INSERT INTO dbo.KeyWordLookup VALUES(6,'certificates have insurance'); WAITFOR DELAY '00:00:05'; -- Finds 0 rows with noise words and 2 rows without noise words SELECT 'Words' AS Method, * FROM dbo.KeyWordLookup WHERE CONTAINS(ContentDetail,N'"certificate*" AND "of" AND "insurance"'); -- Finds 6 rows with noise words and 6 rows without noise words SELECT 'Words no noise' AS Method, * FROM dbo.KeyWordLookup WHERE CONTAINS(ContentDetail,N'"certificate*" AND "insurance"'); -- Finds 3 rows with noise words and 1 row without noise words SELECT 'Words & Phrase' AS Method, * FROM dbo.KeyWordLookup WHERE CONTAINS(ContentDetail,N'"certificate" AND "of insurance"'); -- Finds 2 rows with noise words and 1 row without noise words SELECT 'Phrase' AS Method, * FROM dbo.KeyWordLookup WHERE CONTAINS(ContentDetail,N'"certificate of insurance"') ; -- Finds 2 rows with noise words and 1 row without noise words SELECT 'Wildcard Phrase' AS Method, * FROM dbo.KeyWordLookup WHERE CONTAINS(ContentDetail,N'"certificate* of insurance"') ; DROP FULLTEXT INDEX on dbo.KeyWordLookup; DROP FULLTEXT CATALOG ft; DROP TABLE dbo.KeyWordLookup;
This just illustrates the effect of noise words even when you do not care about them. Since you are using CONTAINSTABLE there are a number of options, including INFLECTIONAL that allow you to tailor how words are identified. (If you drop the fulltext catalog, you will need to do a log backup before recreating it.)
Since you are using SQL Server 2005 it is not so easy to turn Noise Words on and off. What I do is run tests against LANGUAGE 'ENGLISH' where I have noise words and then against LANGUAGE 'BRITISH ENGLISH' where I have removed all the noise words.
Perhaps this gives you a start.
RLF
- เสนอเป็นคำตอบโดย amber zhang 7 พฤษภาคม 2555 2:12
- ยกเลิกการนำเสนอเป็นคำตอบโดย jiff9999 7 พฤษภาคม 2555 13:56
- ทำเครื่องหมายเป็นคำตอบโดย jiff9999 7 พฤษภาคม 2555 20:29
-
7 พฤษภาคม 2555 17:51
jiff9999, I waited a while for more information. The sp_configure code that I gave you transforms how noise words in the search criteria are treated.
exec sp_configure 'transform noise words', 1
You could write code to strip out noise words from the search criteria if you do not want to change the configuration, but I have run with the configuration for quite a while now and it works for me.
More and more we are not using noiseword/stopword lists and just indexing everything that is indexable according to the language rules. As servers are bigger and more powerful, we have not noticed any real problem with this. The second block of code was to show you some test cases.
The values differ according to (1) transform noise words setting and (2) whether there is a noiseword/stopword list in use.
Was my comment about testing with and without noise words unclear?
RLF
- แก้ไขโดย Russell FieldsMVP 7 พฤษภาคม 2555 17:52
- ทำเครื่องหมายเป็นคำตอบโดย jiff9999 7 พฤษภาคม 2555 20:28
-
7 พฤษภาคม 2555 20:28
I just now have a chance to reply. Your replies are clear.
I had already set the "transform noise words" to 1. The number of results I get using your example SQL with "TNW" set to 1 and having the noise word list active is 6, 6, 3, 2, 2 respectively.
I think it may be better if I just clear out the noise word list and see how that goes because like you mention, servers are more able to handle large amounts of data.
Thanks again!