Answered by:
Copy JSON files from Blob storage as csv to Data lake

Question
-
Hi,
I'm using copy activity to move JSON files from Blob storage. I need to convert the data to csv or similar text file for further processing. In data lake analytics I managed to do it, but there was a problem with string size limit.
Is it prossibly to have conversion done in copy activity, by setting input dataset as JsonFormat and output as TextFormat? If yes, then how should I set up jsonNodeReference and jsonPathDefinition to get desired output?
I have tried with examples in https://docs.microsoft.com/en-us/azure/data-factory/supported-file-formats-and-compression-codecs#jsonformat-example, especially Sample 2: cross apply multiple objects with the same pattern from array but it seems I'm missing something
My Json files look like this:
{
"Id": "1234",
"Timestamp": "2018-01-01T00:00:00.000000+00:00",
"Includes": [
{
"SubId": "123",
"Data": [
{ "X": "1", "Y": "1", "Z": "1"},{ "X": "2", "Y": "2", "Z": "2"}
]
},
{
"SubId": "456",
"Data": [
{ "X": "3", "Y": "3", "Z": "3"},{ "X": "4", "Y": "4", "Z": "4"}
]
}
]
}
The output dataset would look like this:
Id, Timestamp, Subid, X, Y, Z
1234, "2018-01-01T00:00:00.000000+00:00", 123, 1, 1, 1
1234, "2018-01-01T00:00:00.000000+00:00", 123, 2, 2, 2
1234, "2018-01-01T00:00:00.000000+00:00", 456, 3, 3, 3
1234, "2018-01-01T00:00:00.000000+00:00", 456, 4, 4, 4
Br,
Friday, June 1, 2018 6:06 AM
Answers
-
Hi VairavanS
In ADLA, my problem was extracting the files in 3 columns at first as strings (Id, Timestamp, Includes) which caused the last column to be over size limit.
I found example for my problem in
https://github.com/Azure/usql/blob/master/Examples/JSONExamples/JSONExamples/6-MultiLevelExtractionJSON.usql
and managed to run the file transformation in ADLA.
Thanks for suggestions
Monday, June 4, 2018 5:53 AM
All replies
-
When you transform a JSON by data factory, it provides cross-apply along single array field. Since you have two array fields, I dont think you would be able to transform it as you determine. Please take a look at the below images and see which suits you the best
1. Cross-apply along includes.Data array without SubId
2. Cross apply along .includes array
I think the best way to do this transform is using ADLA, Can you please explain more about the issue that you are facing?- Edited by VairavanS (Azure) Friday, June 1, 2018 9:32 PM
- Proposed as answer by VairavanS (Azure) Friday, June 1, 2018 9:32 PM
Friday, June 1, 2018 9:29 PM -
Hi VairavanS
In ADLA, my problem was extracting the files in 3 columns at first as strings (Id, Timestamp, Includes) which caused the last column to be over size limit.
I found example for my problem in
https://github.com/Azure/usql/blob/master/Examples/JSONExamples/JSONExamples/6-MultiLevelExtractionJSON.usql
and managed to run the file transformation in ADLA.
Thanks for suggestions
Monday, June 4, 2018 5:53 AM -
I am glad that you got it to work.Monday, June 4, 2018 5:30 PM