1
votes

I have data stored as JSON object per line in files. What is a good method to extract this in a U-SQL script?

I've got it working using a Text Extractor (see code below) however the JSON objects get big and I'm running into the 128KB size limit for strings. Any help would be appreciated thanks.

Sample data:

{ "prop1": "abc", "prop2": "xyz" }
{ "prop1": "def", "prop2": "uvw" }

U-SQL:

//Read (JSON Lines) line by line
@dataAsStrings =
    EXTRACT jsonObjStr string
    FROM @INPUT_FILE
    USING Extractors.Text(delimiter:'\n');

//Use the JsonTuple function to get the Json Token of the string so it can be parsed later with Json .NET functions
@jsonify = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonObjStr, "prop1", "prop2") AS rec FROM @dataAsStrings;

//Extract the fields from the Json object.
@json = SELECT 
            rec["prop1"] AS prop1,
            rec["prop2"] AS prop2
FROM @jsonify;
2

2 Answers

1
votes

I know the answer is 1 year and 4 months old, but I hope this can help other users.

Try with the following query:

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; 

@trial2 = 
    EXTRACT jsonString string FROM @"/<your-json-file>.json" USING Extractors.Tsv(quoting:false);

@cleanUp = SELECT jsonString FROM @trial2 WHERE (!jsonString.Contains("Part: h" ) AND jsonString!= "465}");

@jsonify = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS props FROM @cleanUp;

@columnized = SELECT 
        props["prop1"] AS prop1,
        props["prop2"] AS prop2
FROM @jsonify;

OUTPUT @columnized
TO @"/out.csv"
USING Outputters.Csv();

You can check this page for details.

0
votes

You should probably write your own "hybrid" extractor that combines the line oriented extraction with the JSON extraction processing.