Reading UTF-8 character through U-SQL


  • Hi ,

    I am not able to read UTF-8 character data through below U-SQL script. 

    Please let me know if i need to add anything to the parameters


    @Source =
        EXTRACT be_id                       string,
                start_date                  string,
                end_date                    string,
                be_start_date               string,
                be_end_date                 string,
                be_name                     string,
                ul_work_location_ps_code   string,
                be_code string

        FROM "/Sample6.csv"
        USING Extractors.Csv();

    File has the data "Asunsión Office" for be_name column . if i remove this particular record then my script is running fine

    Friday, October 27, 2017 1:29 PM

All replies

  • Is the file encoding UTF-8?

    • Edited by CateArcher Monday, October 30, 2017 7:07 PM
    Friday, October 27, 2017 4:54 PM
  • Not Sure about the file encoding but we would like to read the data like this "Asunsión Office". I tried to copy this original data and created separate tsv/csv files but unable to process
    Monday, October 30, 2017 9:12 AM
  • Even after changing the file to UTF-8 encoding , still the script is failing . Below is the error.

    '1_SV1_Extract Error : '{"diagnosticCode":195887139,"severity":"Error","component":"RUNTIME","source":"User","errorId":
    "E_RUNTIME_USER_EXTRACT_ENCODING_ERROR","message":"Encoding error occured after processing 0 record(s) in the vertex' input split.","description":
    "RUNTIME","source":"User","errorId":"E_RUNTIME_USER_EXTRACT_INVALID_CHARACTER","message":"Invalid character for UTF-8 encoding in input stream.",
    "description":"Found invalid character for UTF-8 encoding in input.","resolution":"Correct the invalid character in the input file, 
    or correct encoding in extractor and try again.","helpLink":"","details":"

    • Edited by Chandu50859 Monday, October 30, 2017 10:09 AM
    Monday, October 30, 2017 10:00 AM
  • 1) You changed the file encoding?  What was the file encoding originally?

    2) When you changed the file encoding did you save that file with a new name?  And then perhaps did not update your query to use that new file name?

    Monday, October 30, 2017 7:44 PM