none
Data type error -Extract column conversion invalid error

    Question

  • I have simple command, which works fine if all columns are extracted as string, but problem occurs when I change data type.

    DECLARE @in  string = "/Input/CustomerSalesDataLake.csv";
    DECLARE @out string = "/Output/CustomerSalesDataLakeOutput.csv";

    @log =
        EXTRACT  CustomerId         string,
                 CustomerName       string,
                 Sales              float,
    TimeStamp DateTime
      FROM @in
      USING Extractors.Text(delimiter:';', quoting:false);

    OUTPUT @log   
        TO @out
        USING Outputters.Csv();


    //  File content------------------------------
    //  "/Input/CustomerSalesDataLake.csv";
    //  CustomerId CustomerName Sales DateTime
    //  1 Ford 4,17 2016-12-05T13:08:30.044Z
    //  2 Nissan 7,90 2016-12-05T13:08:30.044Z

    VertexFailedFast: Vertex failed with a fail-fast error
    E_RUNTIME_USER_EXTRACT_COLUMN_CONVERSION_INVALID_ERROR: Invalid character when attempting to convert column data.
    <label class="msportalfx-text-label" data-bind="text: descriptionLabel" style="font-family:az_ea_font, wf_segoe-ui_normal, 'Segoe UI', 'Segoe WP', Tahoma, Arial, sans-serif;line-height:normal;text-transform:uppercase;color:#2d3239;">DESCRIPTION</label>
    HEX: "2253616C657322" Invalid character when converting input record. Sales".
    <label class="msportalfx-text-label" data-bind="text: resolutionLabel" style="font-family:az_ea_font, wf_segoe-ui_normal, 'Segoe UI', 'Segoe WP', Tahoma, Arial, sans-serif;line-height:normal;text-transform:uppercase;color:#2d3239;">RESOLUTION</label>
    Check the input for errors or use "silent" switch to ignore over(under)-sized rows in the input. Consider that ignoring "invalid" rows may influence job results and that types have to be nullable for conversion errors to be ignored.
    <label class="msportalfx-text-label" data-bind="text: detailsLabel" style="font-family:az_ea_font, wf_segoe-ui_normal, 'Segoe UI', 'Segoe WP', Tahoma, Arial, sans-serif;line-height:normal;text-transform:uppercase;color:#2d3239;">DETAILS</label>
    ============================================================================================
    Delimiter{HEX}:3B
    HEX:437573746F6D657249643B437573746F6D65724E616D653B53616C65733B4461746554696D65
                                                                    ^
    TEXT:CustomerId;CustomerName;Sales;DateTime
                                       ^
    
    ============================================================================================
    
    <label class="msportalfx-text-label" data-bind="text: titleLabel" style="font-family:az_ea_font, wf_segoe-ui_normal, 'Segoe UI', 'Segoe WP', Tahoma, Arial, sans-serif;line-height:normal;text-transform:uppercase;color:#2d3239;">ERROR</label>
    E_RUNTIME_USER_EXTRACT_COLUMN_CONVERSION_INVALID_ERROR
    <label class="msportalfx-text-label" data-bind="text: messageLabel" style="font-family:az_ea_font, wf_segoe-ui_normal, 'Segoe UI', 'Segoe WP', Tahoma, Arial, sans-serif;line-height:normal;text-transform:uppercase;color:#2d3239;">MESSAGE</label>
    Invalid character when attempting to convert column data.
    <label class="msportalfx-text-label" data-bind="text: urlLabel" style="font-family:az_ea_font, wf_segoe-ui_normal, 'Segoe UI', 'Segoe WP', Tahoma, Arial, sans-serif;line-height:normal;text-transform:uppercase;color:#2d3239;">JOB URL</label>

    https://smartblocklakedev.azuredatalakeanalytics.net/jobs/d9c5e737-cde0-44e6-8e8e-b33f3496d7bf?api-version=2015-11-01-preview



    Kenny_I

    Wednesday, December 7, 2016 3:01 PM

All replies

  • You'll have to extract the column as a string, then convert using in-line C#:

    @log =
        SELECT CustomerId, CustomerName,
        float.Parse(Sales, CultureInfo.InvariantCulture) AS Sales
        FROM @log;
    



    Wednesday, December 7, 2016 3:35 PM
  • DECLARE @in  string = "/Input/CustomerSalesDataLake.csv";
    DECLARE @out string = "/Output/CustomerSalesDataLakeOutput.csv";

    @log =
        EXTRACT  CustomerId         string,
                 CustomerName       string,
                 float.Parse(Sales, CultureInfo.InvariantCulture) AS Sales,
        TimeStamp DateTime
      FROM @in
      USING Extractors.Text(delimiter:';', quoting:false);

    OUTPUT @log   
        TO @out
        USING Outputters.Csv();

    ----

    DESCRIPTION
    Invalid syntax found in the script.
    RESOLUTION
    Correct the script syntax, using expected token(s) as a guide.
    DETAILS
    at token 'float', line 7
    near the ###:
    **************
    E @out string = "/Output/CustomerSalesDataLakeOutput.csv";

    @log =
        EXTRACT  CustomerId         string,
                 CustomerName       string,
                  ### float.Parse(Sales, CultureInfo.InvariantCulture) AS Sales,
    TimeStamp DateTime
      FROM @in
      USING Extractors.Text(delimiter:';', quoting:false);

    OUTPUT @l
    ERROR
    E_CSC_USER_SYNTAXERROR
    MESSAGE
    syntax error. Expected one of: identifier quoted-identifier
    JOB URL
    https://smartblocklakedev.azuredatalakeanalytics.net/jobs/da7d83db-1142-4d84-b2ba-d5c1bb63d153?api-version=2015-11-01-preview
    • Edited by Kenny_I Thursday, December 8, 2016 7:26 AM
    Wednesday, December 7, 2016 4:16 PM
  • Right, you cannot apply C# code directly in the EXTRACT statement. You need to do that in a subsequent SELECT statement.

    Remember that your U-SQL script is just a statement of the logical series of steps. It is then compiled and optimized, so you don't have to worry about reducing the number of statements as you would do in a traditional database system.

    Thursday, December 8, 2016 2:09 PM
  • I'm getting error with expressions. I also got error with using timestamp type for Timestamp, but better to concentrate to Sales column type, which I need for aggregation calculation.

    //CODE
    DECLARE @in  string = "/Input/CustomerSalesDataLake.csv";
    DECLARE @out string = "/Output/CustomerSalesDataLakeOutput.csv";

    @log =
        EXTRACT  CustomerId         string,
                 CustomerName          string,
                 Sales      string,
        TimeStamp      string
      FROM @in
      USING Extractors.Text(delimiter:';', quoting:false);

    @query =
        SELECT CustomerId,
          CustomerName,
                   float.Parse(Sales, CultureInfo.InvariantCulture) AS Sales,
           TimeStamp
      FROM @log;

    OUTPUT @query   
        TO @out
        USING Outputters.Csv();

    //CustomerSalesDataLake.csv
    CustomerId;CustomerName;Sales;DateTime
    1;Ford;4,17;2016-12-05T13:08:30.044Z
    2;Nissan;7,90;2016-12-05T13:08:30.044Z

    //ERROR
    VertexFailedFast: Vertex failed with a fail-fast error
    E_RUNTIME_USER_EXPRESSIONEVALUATION: Error while evaluating expression float.Parse(Sales, CultureInfo.InvariantCulture)
    DESCRIPTION
    Inner exception from user expression: Input string was not in a correct format. Current row dump: CustomerId: "CustomerId"
     CustomerName: "CustomerName"
     Sales: "Sales"
     TimeStamp: "DateTime"
    RESOLUTION
    DETAILS
    ==== Caught exception System.FormatException

       at System.Number.ParseSingle(String value, NumberStyles options, NumberFormatInfo numfmt)

       at ___Scope_Generated_Classes___.SqlFilterTransformer_3.Process(IRow row, IUpdatableRow output) in d:\data\CCS\jobs\7791cf60-e05f-4398-ae13-1f83974444df_v0\__ScopeCodeGen__.dll.cs:line 75
    ERROR
    E_RUNTIME_USER_EXPRESSIONEVALUATION
    MESSAGE
    Error while evaluating expression float.Parse(Sales, CultureInfo.InvariantCulture)
    JOB URL
    https://smartblocklakedev.azuredatalakeanalytics.net/jobs/7791cf60-e05f-4398-ae13-1f83974444df?api-version=2015-11-01-preview

     

    Kenny_I

    Sunday, December 11, 2016 3:09 PM
  • There are two problems with your script:

    • Your CSV file has a header line, so you'll have to add "skipFirstNRows:1" to your Extractor parameters.
    • Your Sales field is apparently not formatted in the invariant culture (which expects "." as a decimal separator), so you'll have to change that to the appropriate culture, e.g. float.Parse(Sales, new CultureInfo("fr-FR"))
    Monday, December 12, 2016 6:25 PM
  • How to add skipFirstNRows:1? Got invalid syntax error.

    USING Extractors.Text(delimiter:';', quoting:false, skipFirstNRows:1);


    Kenny_I

    Wednesday, December 14, 2016 2:25 PM
  • Can you please show the full error message?

    Also, are you running this locally or on the cluster? If you run it locally, please make sure you have the latest version of the tool.

    Here is the release note explaining the use of skipFirstNRows.


    Michael Rys

    Friday, December 16, 2016 8:35 PM
    Moderator