3
votes

I have a JSON source document that will be uploaded to Azure blob storage regularly. The customer wants to have this input written to Azure Sql Database using Azure Data Factory. The JSON is however complex with many nested arrays and so far I have not be able to find a way to flatten the document. Perhaps this is not supported/possible?

[
{
"ActivityId": 1,
    "Header": {},
    "Body": [{
        "1stSubArray": [{
            "Id": 456,
            "2ndSubArray": [{
                "Id": "abc",
                "Descript": "text",
                "3rdSubArray": [{
                    "Id": "def",
                    "morefields": "text"
                },
                {
                    "Id": "ghi",
                    "morefields": "sample"
                }]
            }]
        }]
    }]
}
]

I need to flatten it:

ActivityId, Id, Id, Descript, Id, morefields
1, 456, abc, text1, def, text
1, 456, abc, text2, ghi, sample
1, 456, xyz, text3, jkl, textother
1, 456, xyz, text4, mno, moretext 

There could be 8+ flat records per ActivityId. Anyone out there that has seen this and found a way to resolve using Azure Data Factory Copy Data?

2
Could you follow this general approach? Instead of a Web Activity you could use a Lookup Activity to get the contents of the Blob and then pass it as a parameter into the stored proc. stackoverflow.com/a/56962908/5070440 – GregGalloway

2 Answers

1
votes

In the past,you could follow this blog and my previous case:Loosing data from Source to Sink in Copy Data to set Cross-apply nested JSON array option in Blob Storage Dataset. However,it disappears now.

Instead,Collection Reference is applied for array items schema mapping in copy activity.

enter image description here

But based on my test,only one array can be flattened in a schema. Multiple arrays can be referenced—returned as one row containing all of the elements in the array. However, only one array can have each of its elements returned as individual rows. This is the current limitation with jsonPath settings.

enter image description here

As workaround,you can first convert json file with nested objects into CSV file using Logic App and then you can use the CSV file as input for Azure Data factory. Please refer this doc to understand how Logic App can be used to convert nested objects in json file to CSV. Surely,you could also make some efforts on the sql database side,such as SP which is mentioned in the comment by @GregGalloway.


Just for summary,unfortunately,the "Collection reference" only works for one level down in the array structure which is not suitable for @Emrikol. Finally,@Emrikol abandoned Data Factory and has built an app to the work.

1
votes

Azure SQL Database has some capable JSON shredding abilities including OPENJSON which shreds JSON, and JSON_VALUE which returns scalar values from JSON. Being as you already have Azure SQL DB in your architecture, it would make sense to use it rather than add additional components.

So why not adopt an ELT pattern where you use Data Factory to insert the JSON into a table in Azure SQL DB and then call a stored procedure task to shred it? Some sample SQL based on your example:

DECLARE @json NVARCHAR(MAX) = '[
{
  "ActivityId": 1,
  "Header": {},
  "Body": [
    {
      "1stSubArray": [
        {
          "Id": 456,
          "2ndSubArray": [
            {
              "Id": "abc",
              "Descript": "text",
              "3rdSubArray": [
                {
                  "Id": "def",
                  "morefields": "text"
                },
                {
                  "Id": "ghi",
                  "morefields": "sample"
                }
              ]
            },
            {
              "Id": "xyz",
              "Descript": "text",
              "3rdSubArray": [
                {
                  "Id": "jkl",
                  "morefields": "textother"
                },
                {
                  "Id": "mno",
                  "morefields": "moretext"
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}
]'

--SELECT @json j

-- INSERT INTO yourTable ( ...
SELECT
    JSON_VALUE ( j.[value], '$.ActivityId' ) AS ActivityId,
    JSON_VALUE ( a1.[value], '$.Id' ) AS Id1,
    JSON_VALUE ( a2.[value], '$.Id' ) AS Id2,
    JSON_VALUE ( a2.[value], '$.Descript' ) AS Descript,
    JSON_VALUE ( a3.[value], '$.Id' ) AS Id3,
    JSON_VALUE ( a3.[value], '$.morefields' ) AS morefields

FROM OPENJSON( @json ) j
    CROSS APPLY OPENJSON ( j.[value], '$."Body"' ) AS m
        CROSS APPLY OPENJSON ( m.[value], '$."1stSubArray"' ) AS a1
            CROSS APPLY OPENJSON ( a1.[value], '$."2ndSubArray"' ) AS a2
                CROSS APPLY OPENJSON ( a2.[value], '$."3rdSubArray"' ) AS a3;

As you can see, I've used CROSS APPLY to navigate multiple levels. My results:

My results