locked
Non english letter RRS feed

  • Question

  • Can AML handle string with not English letters? I have a dataset with not english letters and they all come in with "?". What can I do?

    Tuesday, October 2, 2018 6:30 PM

All replies

  • Is it the only character? Because in that case you could try to replace the imposed '?' with the correct non-english character using modules like Execute Python Script or Execute R Script in your experiment, with the appropriate libraries to deal with text processing.
    Tuesday, October 2, 2018 8:57 PM
  • Yes its only character. Do you have an any example for that?
    Wednesday, October 3, 2018 6:30 AM
  • A simple example. Assuming you have a string containing the character to be replaced, let's say '?':

    string = 'V?gelement'

    with the following instruction you get a new string, with the '?' character replaced by a new character whose Unicode is known:

    string2 = string.replace('?', chr(64))

    In this example the resulting string2 will be 'V@gelement', given that the symbol @ has Unicode = 64 (decimal, agree with ASCII code).

    Try this in an Execute Python Script module, but replacing the Unicode 64 by the code of the character you want to recover.

    Good Luck!


    Wednesday, October 3, 2018 10:26 AM
  • Hi, 

    Azure ML supports only UTF-8 at the moment. Could you ensure that your input dataset was UTF-8 encoded?  

    https://social.msdn.microsoft.com/Forums/azure/en-US/5e353db3-18ef-4816-919a-21e0810a1831/using-unicode-data-within-a-dataset?forum=MachineLearning

    Regards,
    Jaya

    Wednesday, October 3, 2018 5:50 PM