Frage False hits with Contains query

  • 15 декабря 2005 г. 13:52
     
     

    I'lll try this question here too, since I coudn't get an answer in the T-SQL forum

    http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=166991&SiteID=1

    I cant figure out why a query returns false results.

    It is basically:

    select *
    from sometable
    where somecolumn=1
    and contains(someothercolumn, 'arti-b')

    This query takes forever to process and returns about 100 false hits for every row where someothercolumn actually contains the string 'arti-b'.
    Note, I don't get the full set of rows where somecolumn=1, so there is some filtering from the contains clause.

    If I use the same query, searching for just 'arti', it works fine. 'whatever-b' seems to work as well, as long as 'whatever' != 'arti'.

    Does anyone know what causes this?

    SQL server Enterprise edition
    version 8 (SP4)
    Language: US english
    Collation: Finnish_swedish.

Все ответы

  • 17 декабря 2005 г. 17:14
     
     
    Full-text search is a word-based tool for natural language searches.  It's not designed to handle punctuation.

    As suggested in http://support.microsoft.com/kb/200043/EN-US/:

    Where non-alphanumeric character must be used in the search critera (primarily the dash '-' character), use the Transact-SQL LIKE clause instead of the FULLTEXT or CONTAINS predicates.

    In other words, use this query:

    select * from sometable
    where somecolumn=1
    and someothercolumn like '%arti-b%'

    Steve Kass
    Drew University
  • 19 декабря 2005 г. 8:18
     
     

    I see. I never found that KB article.

    I don't suppose this behaviour of the full text search engine can be configured somehow?

    A LIKE-query is not an option for me. Someothercolumn is text and there are over a million rows in the sometable.

    How does the search engine handle hyphens? Is it better to filter out any such strings or is it possible to get something acceptable by querying for CONTAINS(someothertable, "arti b") or perhaps "arti*" ?

    /Gustav