0
votes

I have limited experience with API data ingestion and we're running into issues where the API JSON response is dynamic in nature with bunch of arrays within the JSON for multiple elements. How do you handle dynamic JSON where in you're not aware of elements or even multiple values that you will encounter and then convert into a format that can be easily loaded into Snowflake or any other relational database? We tried converting the entire JSON row into a single row but then we don't know how to parse multiple columns (ABC0, ABC1) for the same column (ABC).

"Job_Profile_Reference": {
      "ID": [
      {
        "@type": "WI",
        "__text": "4acf03b2e8e301fab3caeab105020f"
      },
      {
        "@type": "Job_Profile_ID",
        "__text": "112"
      }
      ]
    }

output

Job_Profile.Job_Profile_Reference.ID.0.@type,Job_Profile.Job_Profile_Reference.ID.0.__text,Job_Profile.Job_Profile_Reference.ID.1.@type,Job_Profile.Job_Profile_Reference.ID.1.__text
WI,4acf03b2e8e301fab3caeab105020f,Job_Profile_ID,112
1
Have you considered loading the JSON directly into Snowflake and querying it directly? You can parse what you need using Snowflake SQL or even create purpose-built views. In you example above, is that just a single record of JSON, or are all of your records part of that "ID" array? - Mike Walton
We can but we're using azure as our data lake and would like to denormalize the JSON to CSV for consumption not only for Snowflake but other downstream applications. - hiphop
This doesn't seem like a problem for Snowflake -- since you want to do all this before loading into Snowflake. Try reposting the question for the jq, and they'll help you transform a json. - Felipe Hoffa
If you want help from the jq community, it would be best to follow the minimal reproducible example guidelines. In particular, it would be helpful to see at least one sample of valid JSON, with the corresponding desired output for each. - peak

1 Answers

0
votes

Dynamic JSON can be easily loaded into Snowflake as is. In your example, with variable length arrays, you can use FLATTEN() to explode the single row of JSON into multiple rows in a query. In this scenario I would suggest you leave it as JSON in your data lake too.

The penalty for such things is relatively low since the JSON data isn't just stored as a long string; instead internally Snowflake analyzes the JSON as its loaded and does things to aid speedy queries, like creating virtual columns for frequent paths like $.Job_Profile_Reference and $.Job_Profile_Reference.ID in your example.

with semistructured as (select parse_json('{"Job_Profile_Reference": {"ID": [{"@type": "WI","__text": "4acf03b2e8e301fab3caeab105020f"},{"@type": "Job_Profile_ID","__text": "112"}]}}') json)
select * from semistructured, lateral flatten(json, recursive=>True) structured;

JSON    SEQ KEY PATH    INDEX   VALUE   THIS
{    "Job_Profile_Reference": {      "ID": [        {          "@type": "WI",          "__text": "4acf03b2e8e301fab3caeab105020f"        },        {          "@type": "Job_Profile_ID",          "__text": "112"        }      ]    }  }   1   Job_Profile_Reference   Job_Profile_Reference       {    "ID": [      {        "@type": "WI",        "__text": "4acf03b2e8e301fab3caeab105020f"      },      {        "@type": "Job_Profile_ID",        "__text": "112"      }    ]  }  {    "Job_Profile_Reference": {      "ID": [        {          "@type": "WI",          "__text": "4acf03b2e8e301fab3caeab105020f"        },        {          "@type": "Job_Profile_ID",          "__text": "112"        }      ]    }  }
{    "Job_Profile_Reference": {      "ID": [        {          "@type": "WI",          "__text": "4acf03b2e8e301fab3caeab105020f"        },        {          "@type": "Job_Profile_ID",          "__text": "112"        }      ]    }  }   1   ID  Job_Profile_Reference.ID        [    {      "@type": "WI",      "__text": "4acf03b2e8e301fab3caeab105020f"    },    {      "@type": "Job_Profile_ID",      "__text": "112"    }  ]  {    "ID": [      {        "@type": "WI",        "__text": "4acf03b2e8e301fab3caeab105020f"      },      {        "@type": "Job_Profile_ID",        "__text": "112"      }    ]  }
{    "Job_Profile_Reference": {      "ID": [        {          "@type": "WI",          "__text": "4acf03b2e8e301fab3caeab105020f"        },        {          "@type": "Job_Profile_ID",          "__text": "112"        }      ]    }  }   1       Job_Profile_Reference.ID[0] 0   {    "@type": "WI",    "__text": "4acf03b2e8e301fab3caeab105020f"  }    [    {      "@type": "WI",      "__text": "4acf03b2e8e301fab3caeab105020f"    },    {      "@type": "Job_Profile_ID",      "__text": "112"    }  ]
{    "Job_Profile_Reference": {      "ID": [        {          "@type": "WI",          "__text": "4acf03b2e8e301fab3caeab105020f"        },        {          "@type": "Job_Profile_ID",          "__text": "112"        }      ]    }  }   1   @type   Job_Profile_Reference.ID[0]['@type']        "WI"    {    "@type": "WI",    "__text": "4acf03b2e8e301fab3caeab105020f"  }
{    "Job_Profile_Reference": {      "ID": [        {          "@type": "WI",          "__text": "4acf03b2e8e301fab3caeab105020f"        },        {          "@type": "Job_Profile_ID",          "__text": "112"        }      ]    }  }   1   __text  Job_Profile_Reference.ID[0].__text      "4acf03b2e8e301fab3caeab105020f"    {    "@type": "WI",    "__text": "4acf03b2e8e301fab3caeab105020f"  }
{    "Job_Profile_Reference": {      "ID": [        {          "@type": "WI",          "__text": "4acf03b2e8e301fab3caeab105020f"        },        {          "@type": "Job_Profile_ID",          "__text": "112"        }      ]    }  }   1       Job_Profile_Reference.ID[1] 1   {    "@type": "Job_Profile_ID",    "__text": "112"  }   [    {      "@type": "WI",      "__text": "4acf03b2e8e301fab3caeab105020f"    },    {      "@type": "Job_Profile_ID",      "__text": "112"    }  ]
{    "Job_Profile_Reference": {      "ID": [        {          "@type": "WI",          "__text": "4acf03b2e8e301fab3caeab105020f"        },        {          "@type": "Job_Profile_ID",          "__text": "112"        }      ]    }  }   1   @type   Job_Profile_Reference.ID[1]['@type']        "Job_Profile_ID"    {    "@type": "Job_Profile_ID",    "__text": "112"  }
{    "Job_Profile_Reference": {      "ID": [        {          "@type": "WI",          "__text": "4acf03b2e8e301fab3caeab105020f"        },        {          "@type": "Job_Profile_ID",          "__text": "112"        }      ]    }  }   1   __text  Job_Profile_Reference.ID[1].__text      "112"   {    "@type": "Job_Profile_ID",    "__text": "112"  }```