I've tried different methods to extract data from my JSON files and convert to CSV in U-SQL but they all seem to generate empty files or just output the header row.
I was previously trying JSON tuples but since that generated empty files I am now trying to use the MultiLevelJsonExtractor.
My JSON files are structured like this:
{
"responseHeader":{
"status":0,
"QTime":25,
"params":{
"q":"query",
"rows":"7000",
"wt":"json"
}
},
"response":{
"docs":[
{
"uri":"www.google.com",
"date_dt":"2017-05-30T23:00:00Z",
"title":"Google"
},
{
"uri":"www.yahoo.com",
"date_dt":"2017-03-30T23:00:00Z",
"title":"Yahoo"
}
]
}
}
My current code:
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
USING Microsoft.Analytics.Samples.Formats.Json;
DECLARE @INPUT_FILE string = @"{*}.json";
DECLARE @OUTPUT_FILE string = @"/Output/Output.csv";
@json =
EXTRACT
uri string,
date_dt string,
title string,
FROM
@INPUT_FILE
USING new MultiLevelJsonExtractor("docs[*]", true, "uri", "date_dt", "title");
OUTPUT @json
TO @OUTPUT_FILE
USING Outputters.Csv(outputHeader:true,quoting:true);
This just outputs the header row. Is there an issue with the way I am referencing multiple files? e.g {*}.json?
I'm just trying to pull out the three fields within the docs nodeset from each JSON file and output to CSV.