.NET Framework Developer Center >
.NET Development Forums
>
Regular Expressions
>
How to find misspelled words with regex
How to find misspelled words with regex
- Hi all
I need a RegEx that will find a specific word within a long string. The issue is that this word may be misspelled and I need to find it even so. I would like to accept a certain percentage of wrong-ness when looking for the word. Ex.
The complete string: Hello, this is my comp/et string to look at
The word to search for: complete
Let’s say that I which to accept a maximum of two wrong letters in the above string, then the RegEx should match the word complete. However, if I only accept 1 wrong letter it shouldn’t find it. Ideally the RegEx would also be able to handle whitespaces, and missing letters. Ex:
The complete string: Hello, this is my com plee string to look at
The word to search for: complete
This should match the word as well, even though there is a whitespace between ‘m’ and ‘p’ and the letter ‘l’ is missing. Is this possible at all with RegEx or should I be looking at an alternative way to solve it?
Thanks, Tommy
Answers
- Regex is not designed to be a tokenizer and that is where it will fall short for this situation.
Looking at your example the word common would be a misspell for complete , yet it is not. The plee example you mentioned has nothing to do with com, it would have to be its own rule to mark that is a problem. Hopefully you wouldn't have anyone writing about Comanches as well that would not bring comfort to your parser.
Unless you want to create multiple patterns and string C#/VB logic behind it to handle each situation, sure it can be done. But there is no one or two patterns to handle this.
Maybe this is a class assignment or your own sandbox work, and it can be done for a few words, but each word will need its own logic processing and the return on the amount of work done will not be worth it.... IMHO GL
William Wegerson (www.OmegaCoder.Com)- Marked As Answer byeryangMSFT, ModeratorWednesday, November 11, 2009 10:05 AM
All Replies
- Regex is not designed to be a tokenizer and that is where it will fall short for this situation.
Looking at your example the word common would be a misspell for complete , yet it is not. The plee example you mentioned has nothing to do with com, it would have to be its own rule to mark that is a problem. Hopefully you wouldn't have anyone writing about Comanches as well that would not bring comfort to your parser.
Unless you want to create multiple patterns and string C#/VB logic behind it to handle each situation, sure it can be done. But there is no one or two patterns to handle this.
Maybe this is a class assignment or your own sandbox work, and it can be done for a few words, but each word will need its own logic processing and the return on the amount of work done will not be worth it.... IMHO GL
William Wegerson (www.OmegaCoder.Com)- Marked As Answer byeryangMSFT, ModeratorWednesday, November 11, 2009 10:05 AM


