locked
Replacing Non-English rows with English strings RRS feed

  • Question

  • Hello!

    I am working on my data set, where in certain column I have different rows of English and Finnish sentences. I just want to detect the Finnish sentences and replace them with a special English string value. I did not find similar problem someone faced before. Would you please help.

    And another question, is it somehow possible I can use google translator or any other tools that the detected Finnish sentences will be automatic translated into English. 

    Sunday, January 10, 2016 12:29 PM

Answers

  • Hi 

    For all this R is your friend, use R package "textcat",  it able to detect up to 75 languages i think 

    https://cran.r-project.org/web/packages/textcat/textcat.pdf

    download:

    https://cran.r-project.org/web/packages/textcat/index.html

    Example :

    library("textcat")
    textcat(c("Hello in English","Bonjour en francais", "Hallo auf Deutsch"))

    [1] "english","french","german"

    Once detected, you can use curl to invoke google translate or bing translator API

    https://cran.r-project.org/web/packages/curl/index.html

    Reg

    • Proposed as answer by raymondl_msft Sunday, January 10, 2016 5:37 PM
    • Marked as answer by Bipul Mohanto Sunday, January 10, 2016 9:16 PM
    Sunday, January 10, 2016 1:30 PM

All replies

  • Hi 

    For all this R is your friend, use R package "textcat",  it able to detect up to 75 languages i think 

    https://cran.r-project.org/web/packages/textcat/textcat.pdf

    download:

    https://cran.r-project.org/web/packages/textcat/index.html

    Example :

    library("textcat")
    textcat(c("Hello in English","Bonjour en francais", "Hallo auf Deutsch"))

    [1] "english","french","german"

    Once detected, you can use curl to invoke google translate or bing translator API

    https://cran.r-project.org/web/packages/curl/index.html

    Reg

    • Proposed as answer by raymondl_msft Sunday, January 10, 2016 5:37 PM
    • Marked as answer by Bipul Mohanto Sunday, January 10, 2016 9:16 PM
    Sunday, January 10, 2016 1:30 PM
  • the textcat library works very nice, but using the curl library seems problematic, probably the curl library package has an updated version RCurl, but implement is difficult.
    Wednesday, January 13, 2016 11:02 PM
  • the textcat library works very nice, but using the curl library seems problematic, probably the curl library package has an updated version RCurl, but implement is difficult.
    Wednesday, January 13, 2016 11:03 PM
  • Im aware that there is a probleme with https invocation on Azure Ml, are you using https for tranlation API ??

    Wednesday, January 13, 2016 11:38 PM