Azure Data Factory complex JSON source (nested arrays) to Azure Sql Database?

Question

I have a JSON source document that will be uploaded to Azure blob storage regularly. The customer wants to have this input written to Azure Sql Database using Azure Data Factory. The JSON is however complex with many nested arrays and so far I have not be able to find a way to flatten the document. Perhaps this is not supported/possible?

[
{
"ActivityId": 1,
    "Header": {},
    "Body": [{
        "1stSubArray": [{
            "Id": 456,
            "2ndSubArray": [{
                "Id": "abc",
                "Descript": "text",
                "3rdSubArray": [{
                    "Id": "def",
                    "morefields": "text"
                },
                {
                    "Id": "ghi",
                    "morefields": "sample"
                }]
            }]
        }]
    }]
}
]

I need to flatten it:

ActivityId, Id, Id, Descript, Id, morefields
1, 456, abc, text1, def, text
1, 456, abc, text2, ghi, sample
1, 456, xyz, text3, jkl, textother
1, 456, xyz, text4, mno, moretext

There could be 8+ flat records per ActivityId. Anyone out there that has seen this and found a way to resolve using Azure Data Factory Copy Data?

Could you follow this general approach? Instead of a Web Activity you could use a Lookup Activity to get the contents of the Blob and then pass it as a parameter into the stored proc. stackoverflow.com/a/56962908/5070440 — GregGalloway

Jay Gong Jay Gong · Accepted Answer · 2020-02-06T07:32:30

In the past,you could follow this blog and my previous case:Loosing data from Source to Sink in Copy Data to set Cross-apply nested JSON array option in Blob Storage Dataset. However,it disappears now.

Instead,Collection Reference is applied for array items schema mapping in copy activity.

But based on my test,only one array can be flattened in a schema. Multiple arrays can be referenced—returned as one row containing all of the elements in the array. However, only one array can have each of its elements returned as individual rows. This is the current limitation with jsonPath settings.

As workaround,you can first convert json file with nested objects into CSV file using Logic App and then you can use the CSV file as input for Azure Data factory. Please refer this doc to understand how Logic App can be used to convert nested objects in json file to CSV. Surely,you could also make some efforts on the sql database side,such as SP which is mentioned in the comment by @GregGalloway.

Just for summary,unfortunately,the "Collection reference" only works for one level down in the array structure which is not suitable for @Emrikol. Finally,@Emrikol abandoned Data Factory and has built an app to the work.

Azure Data Factory complex JSON source (nested arrays) to Azure Sql Database?

2 Answers