How to upload JSON file to Azure Data Lake table

Question

I am a newbie to ADL & JSON files. I am trying to load a JSON file into an ADL table.

My JSON file structure is

{ABCD:{Time:"", Date:"", ProcessingTime:"", ProcessName:""}},
{ABCD:{Date:"", ProcessingTime:"", ProcessName:""}},
{ABCD:{ProcessingTime:"", ProcessName:""}},
{ABCD:{Time:"", Date:"", ProcessingTime:"", ProcessName:""}},

My table has all the 4 columns (Time, Data, ProcessingTime, & ProcessName).

First, I tried writing it to a CSV file using USQL statements before writing it into a table. But, the CSV output got generated with all blank records.

Any help is appreciated. Can I do this through ADF as well? I would like to have this as a scheduled job.

Below is the USQL code I used to write the CSV file.

CREATE ASSEMBLY IF NOT EXISTS [Newtonsoft.Json] FROM 
"C:/Test/Assemblies/Newtonsoft.Json.dll";
CREATE ASSEMBLY IF NOT EXISTS [Microsoft.Analytics.Samples.Formats] FROM 
"C:/ADL/Assemblies/Microsoft.Analytics.Samples.Formats.dll";

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

USING Microsoft.Analytics.Samples.Formats.Json;

DECLARE @path string = @"C:\Test\";
DECLARE @input string = @path + @"sample_data1.json";
DECLARE @to string = @path + @"output.csv";

@jsonFile =
 EXTRACT
Time string,
Date string,
ProcessingTime string,
ProcessName string 
FROM @input
USING new JsonExtractor();

OUTPUT @jsonFile
TO @to
USING Outputters.Csv();

Cheers!

Peter Bons Peter Bons · Accepted Answer · 2017-08-15T07:10:25

That file does not contain a valid Json document. It seems to be a Json object per line. ADL can handle Json files with an object per line but each Json object should be written on a new line without any additional separators so you should remove the , at the end of each line. Like this:

{"ABCD":{"Time":"", "Date":"", "ProcessingTime":"", "ProcessName":""}}
{"ABCD":{"Date":"", "ProcessingTime":"", "ProcessName":""}}
{"ABCD":{"ProcessingTime":"", "ProcessName":""}}
{"ABCD":{"Time":"", "Date":"", "ProcessingTime":"", "ProcessName":""}}

Then you cannot use the JsonExtractor directly, you will have to use the text extractor to extract all separate Json lines and then use the JsonTuple method to convert it to Json:

CREATE ASSEMBLY IF NOT EXISTS [Newtonsoft.Json] FROM 
"C:/Test/Assemblies/Newtonsoft.Json.dll";
CREATE ASSEMBLY IF NOT EXISTS [Microsoft.Analytics.Samples.Formats] FROM 
"C:/ADL/Assemblies/Microsoft.Analytics.Samples.Formats.dll";

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

USING Microsoft.Analytics.Samples.Formats.Json;

DECLARE @path string = @"C:\Test\";
DECLARE @input string = @path + @"sample_data1.json";
DECLARE @to string = @path + @"output.csv";

@RawExtract  = EXTRACT [RawString] string
FROM @input 
USING Extractors.Text(delimiter:'\b', quoting : false);

@ParsedJSONLines = SELECT JsonFunctions.JsonTuple([RawString]) AS JSONLine
    FROM @RawExtract;

@jsonObjects =
SELECT JsonFunctions.JsonTuple(JSONLine["ABCD"]) AS Abcd
FROM @ParsedJSONLines;

 @result =
SELECT 
    Abcd["Time"] AS Time,
    Abcd["Date"] AS Date,
    Abcd["ProcessingTime"] AS ProcessingTime,
    Abcd["ProcessName"] AS ProcessName
FROM @jsonObjects;

OUTPUT @result
TO @to
USING Outputters.Csv();

How to upload JSON file to Azure Data Lake table

2 Answers