locked
Help me to create Spelling Checker RRS feed

  • Question

  • User-768741433 posted

    Hi,

    I have to create a web based spelling checker for Hindi Language. I have ~3,00,000 hindi words

    Spelling checker will perform these two tasks:

    1. Check whether the word is correct or not

    2. If it is not correct then provide suggestions

    I am creating a web service which will do this task. Various clients like Wordpress Plugins will be developed to consume this service.

    Is this the best approach? Or I should do something else?

    Another question:

    It is easy to check the word is correct or not. But to provide suggestion it looks like I have to perform many queries for each word.

    The algorithm that I thought for spelling suggestions is:


    create a list of various "Matras" and "Alphabets"

    add each "Matra" and "Alphabet" to the wrong word then check its entry in database.

    if found then add it to array of suggested words.

    after completing this loop return array of suggested words.


    But there are many "Matras" and "Alphabets" in Hindi language. Therefore this will make a lot of database queries. This is the main problem.

    Is there any alternative way to get spelling suggestions?

    [NOTE: This application will be only used for Hindi language, Not for English]

    Wednesday, June 9, 2010 8:47 AM

Answers

  • User-952121411 posted

    I am creating a web service which will do this task. Various clients like Wordpress Plugins will be developed to consume this service.

    Is this the best approach? Or I should do something else?

     

    I will comment on this piece.  Most spell checkers on the net consist of locally residing dictionary files or .dlls that do the spell checking.  I think having a client need to make a cross boundary web service call just to do spell checking might introduce a little too much overhead for the intended purpose.

    Take a look to this project for some ideas on creating your own spell checker:

    NetSpell - Spell Checker for .NET:

    http://www.codeproject.com/KB/string/netspell.aspx

     

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Wednesday, June 9, 2010 2:04 PM

All replies

  • User-952121411 posted

    I am creating a web service which will do this task. Various clients like Wordpress Plugins will be developed to consume this service.

    Is this the best approach? Or I should do something else?

     

    I will comment on this piece.  Most spell checkers on the net consist of locally residing dictionary files or .dlls that do the spell checking.  I think having a client need to make a cross boundary web service call just to do spell checking might introduce a little too much overhead for the intended purpose.

    Take a look to this project for some ideas on creating your own spell checker:

    NetSpell - Spell Checker for .NET:

    http://www.codeproject.com/KB/string/netspell.aspx

     

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Wednesday, June 9, 2010 2:04 PM
  • User-1852462877 posted

    The link provided above is very good, considering the understanding of English language.
    You will have to document similar rules with Hindi Language and then work upon your code part.
    The suggestion that I can give you is very complex but it is what is mostly being used in english language.
    Regular Expressions can be used for finding suggestions.
    For that you will have to find patterns in your words.
    The words having the same patterns will be having the same regular expression. You will also have to generate Regular Expressions on the fly/runtime(Just Once and then you can cache them)
    After a RE has been generated it will be run against the whole database to find the matching words and those will be the suggestions.
    You will also have to include logic such as "the first character in the word is always assumed to be correct". Thus reducing your pattern matching to just a single alphabet-list. This thing is called as suffix in the above posted link. If no pattern is found, assume the first character to be incorrect and suggest work on the "mid portion" of the word and similarly for the "last portion" of the word.
    I can give you a very basic and simple exmple in ASP.NET/VB
    The example will validate any of the words (cat|bat|mat)

    The link provided above is very good, considering the understanding of English language.

    You will have to document similar rules with Hindi Language and then work upon your code part.

    The suggestion that I can give you is very complex but it is what is mostly being used in english language.

    Regular Expressions can be used for finding suggestions.

    For that you will have to find patterns in your words.

    The words having the same patterns will be having the same regular expression. You will also have to generate Regular Expressions on the fly/runtime(Just Once and then you can cache them)

    After a RE has been generated it will be run against the whole database to find the matching words and those will be the suggestions.

    You will also have to include logic such as "the first character in the word is always assumed to be correct". Thus reducing your pattern matching to just a single alphabet-list. This thing is called as suffix in the above posted link. If no pattern is found, assume the first character to be incorrect and suggest work on the "mid portion" of the word and similarly for the "last portion" of the word.

    I can give you a very basic and simple exmple in ASP.NET/VB

    The example will validate any of the words (cat|bat|mat)

    <%@ Page Language="vb" AutoEventWireup="false" %>
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
    <html>
    <head><title></title>
    </head>
    <body>
    <form Id="Form1" RunAt="server">
    <asp:TextBox id="txtInput" runat="server"></asp:TextBox>
    <asp:RegularExpressionValidator Id="revInput" RunAt="server"
    ControlToValidate="txtInput"
    ErrorMessage="Please enter a valid value"
    ValidationExpression=".*\b[bcm]at\b.*"></asp:RegularExpressionValidator>
    <asp:Button Id="btnSubmit" RunAt="server" CausesValidation="True"
    Text="Submit"></asp:Button>
    </form>
    </body>


    See Ya!

    Sunday, August 22, 2010 10:05 AM