none
How to handle NULLs in source csv data

    Question

  • Just getting started with U-SQL and, when attempting to process my source data file, it is failing with the following error:

      "message": "Failure when attempting to convert NULL column data.",
      "description": "Can not convert NULL string to proper type.\nPosition: line 2, column 7.",
      "resolution": "Check the input for errors or use \"silent\" switch to ignore over(under)-sized rows in the input.\n

    Here's a sample of my data:

    US1TXTV0133,20140101,SNOW,0,,,N,
    USC00178998,20140101,TMAX,-22,,,7,0700
    USC00178998,20140101,TMIN,-283,,,7,0700

    And here's my query:

    @searchlog =
        EXTRACT StationId string,
                ObservationDateKey int,
                ElementType string,
                ElementValue double,
                MeasurementFlag string,
                QualityFlag string,
                SourceFlag string,
                ObservationTimeKey int
                
        FROM "/noaa/2014/01/01/20140101.csv"
        USING Extractors.Csv();
    
    OUTPUT @searchlog   
        TO "/output/SearchLog-first-u-sql.tsv"
    USING Outputters.Tsv();
    So, where, syntactically, does the "silent" switch belong; and is there something else I need to do to mark these columns as nullable?


    Jason

    Wednesday, November 11, 2015 1:01 PM

Answers

  • You can treat the columns as .NET types, and so, for things which are legitimately nullable, you can use the type int? for instance.

    If nulls are not generally expected, but show up in the occasional record erroneously, and you are ok with not keeping the records for subsequent processing, you can use silent:true on your extractor. For more details, check out the details on the built in extractors in this documentation: https://msdn.microsoft.com/en-us/library/azure/mt621366.aspx

    --mw


    Program Manager -- hadoop -- http://blogs.msdn.com/mwinkle

    Thursday, November 12, 2015 3:46 AM

All replies

  • You can treat the columns as .NET types, and so, for things which are legitimately nullable, you can use the type int? for instance.

    If nulls are not generally expected, but show up in the occasional record erroneously, and you are ok with not keeping the records for subsequent processing, you can use silent:true on your extractor. For more details, check out the details on the built in extractors in this documentation: https://msdn.microsoft.com/en-us/library/azure/mt621366.aspx

    --mw


    Program Manager -- hadoop -- http://blogs.msdn.com/mwinkle

    Thursday, November 12, 2015 3:46 AM
  • That makes perfect sense and works like a charm. Thank you!

    Jason


    Thursday, November 12, 2015 1:36 PM