locked
No True Fuzzy Search Under Sql Server 2008 ? RRS feed

  • Question

  • Greetings,

    In context of SQL Server 2008's full-text search capability, does anyone know if there are any near future plans for supporting a true (natively supported) fuzzy search algorithm out of the box? We are evaluating SQL Server 2008’s full-text search capabilities and so far it has been really great with the exception of the Fuzzy search.

    So that the responder is aware of the length of my research here is what I already know:

    1) I know Oracle uses the fuzzy question mark (?) operator to expand queries to include words that are spelled similarly to the specified term. This type of expansion is helpful for finding more accurate results when there are frequent misspellings in the documents in the database. This is the sort of capability we are looking to find in SQL Server 2008.

    2) Currently the main fuzzy utility available via SQL Server 2008 (and prior versions) is an implementation of the "soundex" algorithm which is known for returning a lot of false positive and poor matches.

    3) I do know that Microsoft does have the newly acquired FAST ESP product whose fuzzy searching capabilities are considered to be phenomenal but my current interest is leveraging SQL Server’s full-text searching capability and determining if a true Fuzzy search support is in the cards.

    4) I also do know that one can implement, using SQL CLR and a form of Ternary Search Tree, a flavor of Fuzzy searching algorithm within SQL Server 2008 but there is time and cost associated with this effort that we would like to avoid (see Pro Full-Text Search in SQL Server 2008 by Michael Coles with Hilary Cotter – which I have read cover to cover). I am aware of Q-Gram, Levenshtein Edit Distance, Metaphone and Double Metaphone algorithms and their implementations.

    5) I know one can try to clean the data using SQL Server Integration Services but that is not the path we want to take. We want to fuzzy search the data as is and within the context of SQL Server 2008's full-text indexing capability.

    So to sum up, I just want to know if, within the context of full-text searching, a true FUZZY search capability is in the works for SQL Server 2008 (service packs) and beyond. If so, how soon ?

    If Fuzzy Search is not coming to SQL Server 2008 any time soon, then I suppose the only path left is to do it yourself using something akin to item #4 above.

    Thanks,

    Tareen
    Friday, April 3, 2009 3:31 PM

All replies

  • Hi Tareen

    1) SQL Server FTS does not ship with this feature. What you will need to do is put the spelling variations as substitutions in your thesaurus file.
    2) Actually it does not do soundex, what it does is linguistic stemming, so it will return results to the plural or singular form of the word, and then all declensions, ie book will be stemmed and searched on as book, books, booked, booking, etc.
    3)  Its difficult to predict how Microsoft will integrate FAST into its search products. FAST as you know is very fast, but its storage requirements can exceed the documents you index. Plus Microsoft's search algorithim is considered to be the best in the trec nist "competitions".
    4) I am working on a shocking fast method of affix/prefix searches, but its storage requirements are also signficant. Again, this search is really precise, not the fuzzy sort you require.
    5) Other than the steps mentioned in 1) it can't be done.

    I find #4 to be too slow to be practical.
    looking for a book on SQL Server replication? http://www.nwsu.com/0974973602.html looking for a book on SQL Server 2008 Administration? http://www.amazon.com/Microsoft-Server-2008-Management-Administration/dp/067233044X looking for a book on SQL Server 2008 Full-Text Search? http://www.amazon.com/Pro-Full-Text-Search-Server-2008/dp/1430215941
    Friday, April 3, 2009 4:57 PM
  • Hi All,

    Guys I want to ask important question and need the answer ASAP please.

    My question is; Can I use FAST ESP to search my own DB tables [Lemmatization], and it is out of the box or I have to write code?

     

    Awaiting your reply...

    Wednesday, April 21, 2010 7:26 AM
  • From what I can gather you will need to write your own code to have FAST index the rows of your database.
    looking for a book on SQL Server 2008 Administration? http://www.amazon.com/Microsoft-Server-2008-Management-Administration/dp/067233044X looking for a book on SQL Server 2008 Full-Text Search? http://www.amazon.com/Pro-Full-Text-Search-Server-2008/dp/1430215941
    Wednesday, April 21, 2010 10:51 AM
  • Thanks Hilary, but just to confirm, you mean that I can use FAST ESP to connect my own database and searching the tables exists, right?

    Thanks in advance.

    Wednesday, April 21, 2010 11:38 AM
  • FAST used to ship with a component which would allow you to create ODBC connections to your database and fast could index your tables and then you could search them. There was some coding involved but not a lot.

    It was called FAST Data Search.

    Microsoft has bought out FAST so you will need to check with them as to what products/components they will be integrating with the next product release. It does appear that this component will still be in the next versions.

    Note that while FAST is fast it also has a very large index. I have seen implementations where the size of the index was larger than the amount of data indexed.

     


    looking for a book on SQL Server 2008 Administration? http://www.amazon.com/Microsoft-Server-2008-Management-Administration/dp/067233044X looking for a book on SQL Server 2008 Full-Text Search? http://www.amazon.com/Pro-Full-Text-Search-Server-2008/dp/1430215941
    Wednesday, April 21, 2010 1:19 PM
  • Thanks a lot, you are the man.
    Wednesday, April 21, 2010 1:34 PM