0
votes

I could not find proper solution to make use Newtonsoft JsonExtractor to parse Input file with new line delimiter.

From the Newtonsoft JsonExtractor I can read the first line successfully when exploded with "$.d.results[*]" but it's not moving to the next line,

As the data is >4MB for each row, can't extract as text. So parsing need to be performed using custom extractor to proceed.

Sample Input:

{"d":{"results":[{"data":{"Field_1":"1","Field_2":"2"},"Field_3":"3","Field_4":"4"}]}}                      
{"d":{"results":[{"data":{"Field_1":"11","Field_2":"21"},"Field_3":"31","Field_4":"41"}]}}
{"d":{"results":[{"data":{"Field_1":"12","Field_2":"22"},"Field_3":"32","Field_4":"42"}]}}

Expected Output:

Field_1|Field_2|Field_3|Field_4
1      |2      |3      |4
11     |21     |31     |41
12     |22     |32     |42

USQL Code:

CREATE ASSEMBLY IF NOT EXISTS [Microsoft.Analytics.Samples.Formats] FROM @"Microsoft.Analytics.Samples.Formats.dll";
CREATE ASSEMBLY IF NOT EXISTS [Newtonsoft.Json] FROM @"Newtonsoft.Json.dll";

REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
REFERENCE ASSEMBLY [Newtonsoft.Json];

USING Microsoft.Analytics.Samples.Formats.Json;

DECLARE @DATA_SOURCE string = "Input.data" ;
@SOURCE =
    EXTRACT Field_1 string,
            Field_2 string,
            Field_3 string,
            Field_4 string
    FROM @DATA_SOURCE
USING new JsonExtractor("$.d.results[*]");
1

1 Answers

0
votes

Considering your source input as below, i.e. more than one JSON document in a file.

{
    "d": {
        "results": [
            {
                "data": {
                    "Field_1": "1",
                    "Field_2": "2"
                },
                "Field_3": "3",
                "Field_4": "4"
            }
        ]
    }
}                      
{
    "d": {
        "results": [
            {
                "data": {
                    "Field_1": "11",
                    "Field_2": "21"
                },
                "Field_3": "31",
                "Field_4": "41"
            }
        ]
    }
}
{
    "d": {
        "results": [
            {
                "data": {
                    "Field_1": "12",
                    "Field_2": "22"
                },
                "Field_3": "32",
                "Field_4": "42"
            }
        ]
    }
}

// if more than one JSON document in a file

The JSON assembly includes a MultiLevelJsonExtractor which allows us to extract data from multiple JSON paths at differing levels within a single pass. See the underlying code and inline documentation over at Github.

Use it by supplying multiple levels of Json Paths. They will be assigned to the schema by index.

The code snippet shows the MultiLevelJsonExtractor in action.

  • The first parameter (rowpath) specifies the base path to start from.
  • The second parameter (bypassWarning) is expecting a boolean value. True = If path isn't found: don't error, return null. False = If path isn't found: error.
  • The third parameter (jsonPaths) is a list of JSON paths starting at the base path otherwise the extractor will recurse to the top of the tree to locate it.

in your case....

   @json =
    EXTRACT
        Field_1 string,
        Field_2 string,
        Field_3 string,
        Field_4 string,
    FROM
        @DATA_SOURCE
    USING new MultiLevelJsonExtractor("d.results[*]",
        true,
        "data.Field_1",
        "data.Field_2",
        "Field_3",
        "Field_4",
         );