none
Compare 2 Words RRS feed

  • Question

  • Questions:
    Is there a possibility to compare if "Hollændervej" and "Hollaedervej" are similiar except the "ae" and "æ" after using Levenshtein algorithm?


    Background: 
    Is it easier to compare the word "asdf" and "ssdf" after using Levenshtein algorithm because you the value of the difference that is 1.

    In order to compare "asdf" and "ssdf" you just remove the first letter.

    Compare to  "Hollændervej" and "Hollaedervej", it is a little more difficult.

    I just don't know how to solve it.

    Thank you!
    Thursday, June 7, 2018 8:19 PM

All replies

  • Well, the Levenshtein distance between them should be 2, right?  One insertion, one replacement.

    One alternative, of course, is to replace all of the potential ligatures with their equivalents.  So, before comparing, replace 'æ' with 'ae', and so on.  That's often done with the German 'ß', which compares as 'ss'.


    Tim Roberts, Driver MVP Providenza & Boekelheide, Inc.

    • Proposed as answer by Dolen Zhang Wednesday, June 20, 2018 9:01 AM
    Thursday, June 7, 2018 9:12 PM
  • Distance measure is only one step in text processing. First step should be data preparation and normalization. 

    It is depends on what you want to compare and how. Levenshtein algorithm is only one way to compare. If there is only two words comparison without base dictionary Levenshtein could be right. But if you have base dictionary or larger word collection it is better to index it than try to compare directly. 

    Friday, June 8, 2018 4:42 AM
  • If ‘n’ is present in both of the words, then string.Equals("Hollændervej", "Hollaendervej", StringComparison.InvariantCultureIgnoreCase) returns true.


    • Proposed as answer by Stanly Fan Wednesday, July 4, 2018 6:03 AM
    Friday, June 8, 2018 5:42 AM