locked
how do I identify Latin languages (English, Spanish, French, etc.)? RRS feed

  • Question

  • Hello everyone,

    I am handling a string including multiple languages. I need to split it by each language and handle them differently. If they are using different char sets, it's easy. However Latins share chars (a-z A-z). And I don't know how to identify if a word is from, say, French or Spanish or English.

    For example: tomato (English) tomates(French). how do I identify these 2 words?
    Any idea?

    Regards,
    Elton
    Monday, December 21, 2009 2:12 AM

Answers

  • What is it that you are really trying to do?
    And some languages use the identical word or spelling for identical or even different words.
    Identifying the language by the spelling of an arbitrary word will be unreliable at best.

    Every application runs under a different culture.
    And the CurrentCulture is easily identified through code.

    Mark the best replies as answers. "Fooling computers since 1971."
    • Marked as answer by Harry Zhu Monday, December 28, 2009 2:42 AM
    Monday, December 21, 2009 8:23 PM
  • Method1:  Take a single word and verify the location in the dictionary.  Error:  "no" and "no" contained in English and Spanish dictionary.
    Method2:  Take multiple consecutive words and verify the location of each in the dictionary.  More accurate, but can fail when sentences end in "no" or the like.  Should take into account % of words contained within the string segment.  Easy to flow from one language to another in a consecutive string containing multiple languages. 
    Method3:  Place an identifier within the files themselves:  E.G. <Languge>Spanish</Language>.  100% accurate.  Given the proper schema, this can also identify locations containing text in other languages.

    Its easy to see that you are not dealing with an exact science here. 
    BrianMackey.NET
    • Marked as answer by Harry Zhu Monday, December 28, 2009 2:42 AM
    Monday, December 21, 2009 8:32 PM

All replies

  • No way except having required dictionaries and searching the word in them. e.g.
    Language IdentifyLanguage(string word)
    {
    if(EnglishDictionary.Contains(word))
       return Language.English;
    if(FrenchDictionary.Contains(word))
       return Language.French;
    //and so on
    }

    With best regards, Yasser Zamani
    Monday, December 21, 2009 8:17 PM
  • What is it that you are really trying to do?
    And some languages use the identical word or spelling for identical or even different words.
    Identifying the language by the spelling of an arbitrary word will be unreliable at best.

    Every application runs under a different culture.
    And the CurrentCulture is easily identified through code.

    Mark the best replies as answers. "Fooling computers since 1971."
    • Marked as answer by Harry Zhu Monday, December 28, 2009 2:42 AM
    Monday, December 21, 2009 8:23 PM
  • Method1:  Take a single word and verify the location in the dictionary.  Error:  "no" and "no" contained in English and Spanish dictionary.
    Method2:  Take multiple consecutive words and verify the location of each in the dictionary.  More accurate, but can fail when sentences end in "no" or the like.  Should take into account % of words contained within the string segment.  Easy to flow from one language to another in a consecutive string containing multiple languages. 
    Method3:  Place an identifier within the files themselves:  E.G. <Languge>Spanish</Language>.  100% accurate.  Given the proper schema, this can also identify locations containing text in other languages.

    Its easy to see that you are not dealing with an exact science here. 
    BrianMackey.NET
    • Marked as answer by Harry Zhu Monday, December 28, 2009 2:42 AM
    Monday, December 21, 2009 8:32 PM
  • English:  Nova.  def: A highly energetic object found, thankfully, in very deep space.

    Spanish:  Nova.  def: A phrase meaning something is broken, or in English "No go."

    Mark the best replies as answers. "Fooling computers since 1971."
    Monday, December 21, 2009 8:40 PM