none
Copy JSON files from Blob storage as csv to Data lake

    Question

  • Hi,

    I'm using copy activity to move JSON files from Blob storage. I need to convert the data to csv or similar text file for further processing. In data lake analytics I managed to do it, but there was a problem with string size limit.

    Is it prossibly to have conversion done in copy activity, by setting input dataset as JsonFormat and output as TextFormat? If yes, then how should I set up jsonNodeReference and jsonPathDefinition to get desired output?

    I have tried with examples in  https://docs.microsoft.com/en-us/azure/data-factory/supported-file-formats-and-compression-codecs#jsonformat-example, especially Sample 2: cross apply multiple objects with the same pattern from array but it seems I'm missing something

    My Json files look like this:

    {
     "Id": "1234",
     "Timestamp": "2018-01-01T00:00:00.000000+00:00",
     "Includes": [
            {
                  "SubId": "123",
                  "Data": [
                            { "X": "1", "Y": "1", "Z": "1"},

                            { "X": "2", "Y": "2", "Z": "2"}

                   ]

             },

             {
                  "SubId": "456",
                  "Data": [
                            { "X": "3", "Y": "3", "Z": "3"},

                            { "X": "4", "Y": "4", "Z": "4"}

                   ]

             }

        ]

    }

    The output dataset would look like this:

    Id, Timestamp, Subid, X, Y, Z

    1234, "2018-01-01T00:00:00.000000+00:00", 123, 1, 1, 1

    1234, "2018-01-01T00:00:00.000000+00:00", 123, 2, 2, 2

    1234, "2018-01-01T00:00:00.000000+00:00", 456, 3, 3, 3

    1234, "2018-01-01T00:00:00.000000+00:00", 456, 4, 4, 4

    Br,

    Friday, June 1, 2018 6:06 AM

Answers

  • Hi VairavanS

    In ADLA, my problem was extracting the files in 3 columns at first as strings (Id, Timestamp, Includes) which caused the last column to be over size limit.

    I found example for my problem in

    https://github.com/Azure/usql/blob/master/Examples/JSONExamples/JSONExamples/6-MultiLevelExtractionJSON.usql

    and managed to run the file transformation in ADLA.

    Thanks for suggestions


    • Edited by okmijn Monday, June 4, 2018 9:24 AM
    • Marked as answer by okmijn Monday, June 4, 2018 9:24 AM
    Monday, June 4, 2018 5:53 AM

All replies

  • When you transform a JSON by data factory, it provides cross-apply along single array field. Since you have two array fields, I dont think you would be able to transform it as you determine. Please take a look at the below images and see which suits you the best 

    1. Cross-apply along includes.Data array without SubId

    2. Cross apply along .includes array

    I think the best way to do this transform is using ADLA, Can you please explain more about the issue that you are facing?
    Friday, June 1, 2018 9:29 PM
    Moderator
  • Hi VairavanS

    In ADLA, my problem was extracting the files in 3 columns at first as strings (Id, Timestamp, Includes) which caused the last column to be over size limit.

    I found example for my problem in

    https://github.com/Azure/usql/blob/master/Examples/JSONExamples/JSONExamples/6-MultiLevelExtractionJSON.usql

    and managed to run the file transformation in ADLA.

    Thanks for suggestions


    • Edited by okmijn Monday, June 4, 2018 9:24 AM
    • Marked as answer by okmijn Monday, June 4, 2018 9:24 AM
    Monday, June 4, 2018 5:53 AM
  • I am glad that you got it to work.
    Monday, June 4, 2018 5:30 PM
    Moderator