none
Problem handling Null value in tsv file extractor

    Question

  • I'm getting the following error with the USQL extractor for a tsv file.

    "E_RUNTIME_USER_EXTRACT_COLUMN_CONVERSION_INVALID_ERROR",
      "message": "Invalid character when attempting to convert column data.",
      "description": "HEX: \"225D4F22\" Invalid character when converting input record.\nPosition: line 1, column 10.",
      "resolution": "Check the input for errors or use \"silent\" switch to ignore over(under)-sized rows in the input.\nConsider that ignoring \"invalid\" rows may influence job results and that types have to be nullable for conversion errors to be ignored.",

    The schema data type in the extractor is float? to allow for nulls and the source file in the Data Lake Store shows \N.  

    Update: I added the silent parameter to the extractor and the file is now generating, but, is there any other way to handle this.  My concern is that the "silent" parameter may also hide other issues.
    • Edited by Cmbourgeois Wednesday, December 30, 2015 5:08 AM
    Tuesday, December 29, 2015 1:26 AM

Answers

  • The error message indicates that you probably have the wrong encoding specified or a value that was supposed to be a number is not a number after all. The extractor assumes by default a UTF-8 encoding. It seems that it tries to take the 225D4F22 bytes and convert them into a float.

    The UTF-8 byte sequence 225D4F22 represents "]O" (it is the same in ASCII, but different for Unicode where it represents ≝伢), which is not a valid float and also not null.

    I suggest to check that you have the right value for the right column type (or vice versa :)).


    Michael Rys

    Tuesday, January 5, 2016 9:19 PM
    Moderator