none
Extract JSON lines in U-SQL

    Question

  • I have data stored as JSON object per line in files. What is a good method to extract this in a U-SQL script?

    I've got it working using a Text Extractor (see code below) however the JSON objects get big and I'm running into the 128KB size limit for strings. Any help would be appreciated thanks.

    Example file:

    { "prop1": "abc", "prop2": "xyz" }
    { "prop1": "def", "prop2": "uvw" }



    U-SQL:
       
    //Read (JSON Lines) line by line
    @dataAsStrings = EXTRACT jsonObjStr string
            FROM @INPUT_FILE
            USING Extractors.Text(delimiter:'\n');
    
    //Use the JsonTuple function to get the Json Token of the string so it can be parsed later with Json .NET functions    
    @jsonify = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonObjStr, "prop1", "prop2") AS rec
            FROM @dataAsStrings;
    
    //Extract the fields from the Json object.
    @json = SELECT rec["prop1"] AS prop1, rec["prop2"] AS prop2
            FROM @jsonify;

    Saturday, November 26, 2016 2:33 AM

All replies

  • I wrote a custom extractor to ingest text-based JSON files that don't have a JSON schema (i.e. Array). I have tested it with 100 - 100 MB files and it seems to work fine. If you are interested in the code let me know. I am trying to figure out Git but have made little progress.

    -Carolus Holman

    Sunday, December 4, 2016 7:50 PM
  • Thanks Carolus, I'd be keen to see the code yes. Can you put it up on github or bitbucket?
    Thursday, April 20, 2017 3:34 AM
  • Carolus,

    Did you ever get this onto Git? I would like to see how you did this.

    thanks

    Thursday, January 4, 2018 6:28 PM