1
votes

I am using the YouTube Analytics and Reporting API to get performance data from a Youtube channel and to store it in Azure's Data Factory (ADF) programmitcally. From the YouTube API, I get the following JSON output format:

{
  "kind": "youtubeAnalytics#resultTable",
  "columnHeaders": [
    {
      "name": string,
      "dataType": string,
      "columnType": string
    },
    ... more headers ...
  ],
  "rows": [
    [
      {value}, {value}, ...
    ]
  ]
}

See link here: https://developers.google.com/youtube/analytics/reference/reports/query#response

The response of this GET request is saved as a JSON file in a blob storage. As a next step, I want to copy the data from this JSON file into Azure's SQL database. At this point, Azure Data Factory is unable to read the JSON's schema correctly. Somehow, the "array in array" synthax and the missing "key" classification (as it is in the columnHeaders e.g. "name": string) in the "rows" section is not readable, see screenshot: JSON schema in ADF

Obviously, Azure only allows arrayofObjects: https://docs.microsoft.com/en-gb/azure/data-factory/format-json

If I manually change the synthax in the rows section as it is for the columnHeader, the schema is read correctly which proves to me that ADF is unable to process multiple arrays. As this should be an automated process, the manual workaround is not an option and was for testing purposes only.

Can anyone please tell me how Azure can read the JSON file properly and copy the data into the SQL database? Using a data flow did not look very promising to me but maybe it contains some data transformation features I am not aware of. Thanks for helping me out!

1
Actually, data factory can read the JSON file properly and copy the data into the SQL database. But the limit is that not all the JSON files can be loaded into Azure SQL database correctly, For example for the nested JSON file, we need do the schema conversion and manually and firstly with Data Flow, then the data could be loaded well. And your JSON file is nested JSON, and the value of rows is an array data, we must do these things.Leon Yue
Hi @Daniel, If my answer is helpful for you, hope you can accept it as answer( click on the check mark beside the answer to toggle it from greyed out to filled in.). This can be beneficial to other community members. Thank you.Leon Yue
do you have any other concerns?Leon Yue
Hi there! Thanks a lot for your help, appreciate it! I also had this gut feeling to go with a data flow. Unfortunately, I am also having troubles there. In the data preview of my json file, ADF cannot read my json file. I get a "Malformed records are detected in schema inference. Parse Mode: FAILFAST." error message, any idea why is that?Daniel
Hi @Daniel, can you give us any screenshots? You could post a new question. We all glad to help you!Leon Yue

1 Answers

0
votes

Actually, data factory can read the JSON file properly and copy the data into the SQL database. But the limit is that not all the JSON files can be loaded into Azure SQL database correctly.

For example for the nested JSON file, we need do the schema conversion and manually and firstly with Data Flow, then the data could be loaded well. Per my experience, we usually need create to source with the same JSON file, get the array data we needed and then join them together to get the expected data.

And your JSON file is nested JSON, and the value of rows is an array data, we must do these things. Or we will get errors and the pipeline won't work.

HTH.