none
How do I flatten JSON data into CSV data using USQL?

    Question

  • I can get some data out using Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple, but I'm having trouble flattening the whole file.

    Here's the file format I'm working with:

    {
     "SourceUrl":"http://www.unittest.org/test.html",
     "Title":"Unit Test File",
     "Guest":"Unit Test Guest",
     "PublishDate":"2017-05-15T00:00:00",
     "TranscriptionSections":[  
        {  
         "SectionStartTime":"00:00:03",
         "Sentences":[  
            {  
               "Text":"Intro."
            },
            {  
               "Text":"Sentence one"
            },
            {  
               "Text":"Sentence two"
            }
         ]
      },
      {  
         "SectionStartTime":"00:04:46",
         "Sentences":[  
            {  
               "Text":"Sentence three"
            },
            {  
               "Text":"Sentence four"
            }
         ]
      }
     ],
     "Categories":null
    }

    What I'm trying to get is a row per text (5 of them) including it's 'SectionStartTime' and all top level properties ('PublishDate', 'Guest'...).

    So far I can get a row per 'SectionStartTime' using this:

    USE econosphere;
    
    REFERENCE ASSEMBLY [Newtonsoft.Json];
    REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
    
    DECLARE @in string="adl://abc.azuredatalakestore.net/data/20170515UnitTest.json";
    
    DECLARE @out 
    string="adl://abc.azuredatalakestore.net/processed/20170515UnitTest.csv";
    
    @ep = EXTRACT
    Title string,
    SourceUrl string,
    Guest string,
    PublishDate DateTime,
    TranscriptionSections string
    FROM @in
    USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();
    
    @epAndTransctripts =
        SELECT Title,
            SourceUrl,
            Guest,
            PublishDate,
            Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(TranscriptionSections).Values AS TranscriptionSections_arr
        FROM @ep;
    
    @all =
        SELECT
            Title,
            SourceUrl,
            Guest,
            PublishDate,
            Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(sects)["SectionStartTime"] AS TranscriptionSectionTimes
    
        FROM @epAndTransctripts
        CROSS APPLY
            EXPLODE(TranscriptionSections_arr) AS t(sects);
    
    
    OUTPUT @all
    TO @out 
    USING Outputters.Csv();


    Wednesday, May 17, 2017 9:27 PM

All replies

  • Hi,

    You can do this with MultiLevelJsonExtractor as below...

    REFERENCE ASSEMBLY master.[Newtonsoft.Json];
    REFERENCE ASSEMBLY master.[Microsoft.Analytics.Samples.Formats]; 

    USING Microsoft.Analytics.Samples.Formats.Json;

    DECLARE @in string=@"Test.json";
    DECLARE @out string=@"Test.csv";

    @result =
        EXTRACT SourceUrl string,
                Title string,
                Guest string,
                PublishDate string,
                SectionStartTime string,
                Text string
        FROM @in
        USING new MultiLevelJsonExtractor("TranscriptionSections[*].Sentences[*]",
              false,
              "SourceUrl",
              "Title",
              "Guest",
              "PublishDate",
              "SectionStartTime",
              "Text");         

    OUTPUT @result
    TO @out 
    USING Outputters.Csv();

    -Parthiban

    Thursday, June 1, 2017 1:50 PM