none
FTS - Noise words cause no results to be returned

    Question

  • Hi All.

    I'm hoping someone can comment on the issue I'm having with full text search when it encounters a noise word.

    For example, assume we have a Publication name of "Something at Somewhere".

    The following query will return no results:


    select
    * from Publications WHERE CONTAINS((PublicationName),'"Something " AND "at" AND "Somewhere"')


    Whereas the below query will return the correct results:

    select
    * from Publications WHERE CONTAINS((PublicationName),'"Something" AND "Somewhere"')
    If the terms are "" qualified or not makes no difference either.

    This behaviour seems to be the case when any noise word is encountered (not just "at" as in the above example).  BOL states that when a noise word is  encountered, it is excluded from the query but something else seems to be going on here.

    Transform noise words is 0, and I'm running 9.00.4035.00.

    Cheers,
    Rob

    • Moved by Tom PhillipsModerator Friday, January 22, 2010 1:46 PM TSQL Question (From:SQL Server Database Engine)
    Thursday, January 21, 2010 1:49 PM

Answers

  • Hi Rob,

    Am not sure if you had a question, but hope this explains the behavior you described above.

    You are searching based on simple_term and since you are enclosing the entire search string within double quotes it is considered a phrase. Now if you were to remove the double quotes for each term and search like this,

    select * from Publications WHERE CONTAINS(PublicationName, 'Something AND at AND Somewhere')

    And then run the query you will see query did indeed contain noise words. If you switch to the messages window, you should see the warning which confirm the search had noise words detected - Informational: The full-text search condition contained noise word(s).

    Now, remove the noise word AT and run the query like this,

    select * from Publications WHERE CONTAINS(PublicationName,'Something AND Somewhere')

    This time it will return the 1 row back. The difference between the two is the usage of simple_term search condition, which indicates to SQL that search for match(es) for an exact word or a phrase. [AND is ignore since it's a LOGICAL condition for the search terms]

    What you are seeing is the expected behavior, as when any noise word is encountered in the search condition, you get the warning message displayed. The key is that since you are using AND clause the results are aggregated, so each condition must be true. Since there was a noise word it returned FALSE, so overall it didn't return any row.

    Also, BOL means that noise words are not stored in the full text index and error is displayed when single word search condition contained noise words. To prove this, run this,

    select * from Publications WHERE CONTAINS(PublicationName,'"Something at Somewhere"')

    This time you will get the 1 row returned back even though it contained AT keyword, but also display the warning that search condition contained noise words. This is because it just ignored the word AT and returned rows which contained something & somewhere.

    This is very much expected behavior. Makes sense ?
    Patience is a Virtue... So is "Delayed IO" :)
    Friday, January 22, 2010 3:15 AM

All replies

  • Hi Rob,

    Am not sure if you had a question, but hope this explains the behavior you described above.

    You are searching based on simple_term and since you are enclosing the entire search string within double quotes it is considered a phrase. Now if you were to remove the double quotes for each term and search like this,

    select * from Publications WHERE CONTAINS(PublicationName, 'Something AND at AND Somewhere')

    And then run the query you will see query did indeed contain noise words. If you switch to the messages window, you should see the warning which confirm the search had noise words detected - Informational: The full-text search condition contained noise word(s).

    Now, remove the noise word AT and run the query like this,

    select * from Publications WHERE CONTAINS(PublicationName,'Something AND Somewhere')

    This time it will return the 1 row back. The difference between the two is the usage of simple_term search condition, which indicates to SQL that search for match(es) for an exact word or a phrase. [AND is ignore since it's a LOGICAL condition for the search terms]

    What you are seeing is the expected behavior, as when any noise word is encountered in the search condition, you get the warning message displayed. The key is that since you are using AND clause the results are aggregated, so each condition must be true. Since there was a noise word it returned FALSE, so overall it didn't return any row.

    Also, BOL means that noise words are not stored in the full text index and error is displayed when single word search condition contained noise words. To prove this, run this,

    select * from Publications WHERE CONTAINS(PublicationName,'"Something at Somewhere"')

    This time you will get the 1 row returned back even though it contained AT keyword, but also display the warning that search condition contained noise words. This is because it just ignored the word AT and returned rows which contained something & somewhere.

    This is very much expected behavior. Makes sense ?
    Patience is a Virtue... So is "Delayed IO" :)
    Friday, January 22, 2010 3:15 AM
  • Many thanks for the reply Sundarshan - much appreciated.

    However I wanted to clarify a few things.

    I am not enclosing the entire term in double quotes - only each individual word (when I do this I also receive the ..contained noise word(s) warning).

    However, even if I search as such:

    select * from Publications WHERE CONTAINS(PublicationName, 'Something AND at AND Somewhere')

    I receive the informational message (...contained noise word(s)) yet no results are returned.  Shouldn't FTS simply exclude the noise word and search for 'Something AND Somewhere'?  From your reply it seems it shouldn't return any rows but I'm confused why this is as it is correctly identifying the noise word, and therefore I would assume correctly removing it from the search term to become 'Something AND Somewhere'.

    This was my understanding of BOL, is it incorrect?  Or, is a noise word only removed if the search term is fully qualified as '"Something at Somewhere"'?

    Thanks,
    Rob
    Friday, January 22, 2010 8:52 AM
  • To supress noise words in query use:

     

    exec sp_configure 'show advanced options', 1
    RECONFIGURE
    GO
    exec sp_configure 'transform noise words', 1
    RECONFIGURE
    GO
    

     

    • Proposed as answer by Alexey Kozlov Thursday, October 21, 2010 12:21 PM
    • Edited by Alexey Kozlov Monday, November 08, 2010 8:41 AM fixed
    Thursday, October 21, 2010 12:06 PM
  • Alexey,

    EXEC is missing in front of sp_configure. Can you fix it? Thanks.


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    Tuesday, October 26, 2010 8:05 AM
    Moderator