Ask a questionAsk a question
 

AnswerFuzzy search in database

  • Tuesday, October 13, 2009 1:59 PMDmitry1983 Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Hi all!

    I have a task related to fuzzy searching in database.
    It is assumed to be an sql server database.
    The task is to have an ability to search in large database by approximate input strings.

    So, I have a couple of questions to start my research in that task.
    Is that the realizable task to implement fuzzy searching in database by myself? Or it can be too tricky?

    Or it is more reasonable to use some ready-to-use tools?


    Thanks!


Answers

  • Thursday, October 15, 2009 10:10 AMHilary CotterMVPUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
    for the full-text search algorithm google on okapi bm-25. For levenstein edit distance google on levenstein edit distance. Here is a sample:

    http://www.merriampark.com/ld.htm
    looking for a book on SQL Server replication? http://www.nwsu.com/0974973602.html looking for a book on SQL Server 2008 Administration? http://www.amazon.com/Microsoft-Server-2008-Management-Administration/dp/067233044X looking for a book on SQL Server 2008 Full-Text Search? http://www.amazon.com/Pro-Full-Text-Search-Server-2008/dp/1430215941

All Replies

  • Tuesday, October 13, 2009 2:47 PMHilary CotterMVPUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    SQL Server does not really support fussy searching. Freetext is sometimes considered to be fuzzy, but it is not really.

    You will need to implement something like Levenstein Edit Distance for this.
    looking for a book on SQL Server replication? http://www.nwsu.com/0974973602.html looking for a book on SQL Server 2008 Administration? http://www.amazon.com/Microsoft-Server-2008-Management-Administration/dp/067233044X looking for a book on SQL Server 2008 Full-Text Search? http://www.amazon.com/Pro-Full-Text-Search-Server-2008/dp/1430215941
  • Wednesday, October 14, 2009 3:44 PMDmitry1983 Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Thanks, Hilary

    Continuing the discussion, help me to find where to start from.

    My simplified task is to find several algorithms (under algorithm here I imply the edit distance calculation algorithm and the text indexing algorithm which is more important), then to select more appropriated by performance and to implement it.

    Any links to those algorithms and their descriptions are appreciated.


    Thanks in advance,
    Dmitry
  • Thursday, October 15, 2009 10:10 AMHilary CotterMVPUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     Answer
    for the full-text search algorithm google on okapi bm-25. For levenstein edit distance google on levenstein edit distance. Here is a sample:

    http://www.merriampark.com/ld.htm
    looking for a book on SQL Server replication? http://www.nwsu.com/0974973602.html looking for a book on SQL Server 2008 Administration? http://www.amazon.com/Microsoft-Server-2008-Management-Administration/dp/067233044X looking for a book on SQL Server 2008 Full-Text Search? http://www.amazon.com/Pro-Full-Text-Search-Server-2008/dp/1430215941