0
votes

I’ve a TSV file and one column is a json string, which has array of objects. I need to convert rows into multiple rows based on jsonarray. Can you please guide me to extract the data?

Example row: Product ID Customers Azure SQL 465383 [{"Customer": "Dell", "Country": "US"},{"Customer": "HP","Country": "Germany"}]

Output Expected: Product ID Customer Country Azure SQL 465383 Dell US Azure SQL 465383 HP Germany

Thanks in advance!

1

1 Answers

2
votes

You can use the JsonTuple method in the Microsoft.Analytics.Samples.Formats JSON samples supplied in github here. If you are not sure how to install them, follow the tutorial here.

For example, I got this script to work using this file.

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; 

USING Microsoft.Analytics.Samples.Formats.Json;

DECLARE @inputFile string = "/input/input103.tsv";

@input =
    EXTRACT Product string,
            id string,
            Customers string        // JSON column
    FROM @inputFile
    USING Extractors.Tsv(skipFirstNRows:1);


// Convert Customer JSON string to tuple
@working =
    SELECT Product,
           id,
           JsonFunctions.JsonTuple(Customers).Values AS Customers_map
    FROM @input;


// Explode the MAP to get a row per key-value pair
@output =
    SELECT Product,
           id,
           JsonFunctions.JsonTuple(y) ["Customer"] AS Customer,
           JsonFunctions.JsonTuple(y) ["Country"] AS Country
    FROM @working
         CROSS APPLY
             EXPLODE(Customers_map) AS x(y);


OUTPUT @output
TO "/output/output.csv"
USING Outputters.Csv();

My results:

Results