I have a web scraper dumping data into a variant column in a Snowflake database. This is acraping page data as then creates json arrays for various tables found within the page.
Here is an example of the type of json i would find using a Soccer analogy:
{
"dom_url": "https://www.soccertables.com/european_tables",
"event_id": "01b2722a-d8e6-4f67-95d0-8dd7ba088a4a",
"event_utc_time": "2020-05-11 09:01:14.821",
"ip_address": "125.238.134.96",
"table_1": [
{
"position": "1",
"team_name": "Liverpool",
"games_played": "29",
"games_won": "26",
"games_drawn": "2",
"games_lost": "1",
"goals_for": "75",
"goals_against": "35"
"points": "80"
},
{
"position": "2",
"team_name": "Man. City",
"games_played": "29",
"games_won": "20",
"games_drawn": "5",
"games_lost": "4",
"goals_for": "60",
"goals_against": "45"
"points": "65"
},
{
"position": "...",
"team_name": "...",
"games_played": "...",
"games_won": "...",
"games_drawn": "...",
"games_lost": "...",
"goals_for": "...",
"goals_against": "..."
"points": "..."
}
],
"table_2": [
{
"position": "1",
"team_name": "Bayern Munich",
"games_played": "29",
"games_won": "26",
"games_drawn": "2",
"games_lost": "1",
"goals_for": "75",
"goals_against": "35"
"points": "80"
},
{
"position": "2",
"team_name": "Bayer Leverkussen",
"games_played": "29",
"games_won": "20",
"games_drawn": "5",
"games_lost": "4",
"goals_for": "60",
"goals_against": "45"
"points": "65"
},
{
"position": "...",
"team_name": "...",
"games_played": "...",
"games_won": "...",
"games_drawn": "...",
"games_lost": "...",
"goals_for": "...",
"goals_against": "..."
"points": "..."
}
],
"referrer_url": "https://www.soccertables.com",
}
Ideally, i'd like the output of this to be a flat, relational table:
table_name position team_name games_played etc... table_1 1 Liverpool 29 ... table_1 2 Man. City 29 ... table_2 1 Bayern Munich 29 ... ....
I know that if i were only interested in table_1 i could do this:
SELECT v.value:position::NUMBER POSITION
, v.value:team_name::STRING TEAM_NAME
, v.value:games_played::NUMBER GAMES_PLAYED
, ...
FROM JSON_TABLE a1, LATERAL FLATTEN(JSON_DATA:table_1) v
and that i could do the same for table_2 and union them, but there can be N possibilities with regards to the table_N placeholder.
I've looked at doing LATERAL FLATTEN multiple times:
SELECT v.value:position::NUMBER POSITION
, v.value:team_name::STRING TEAM_NAME
, v.value:games_played::NUMBER GAMES_PLAYED
, ...
FROM JSON_TABLE a1, LATERAL FLATTEN(JSON_DATA:table_1) v, LATERAL FLATTEN(JSON_DATA:table_2) v2
But this results in duplication of data, and does not allow me to put each tables columns all in a single relational structure.
I'm sure there is something simple that i am missing here, but i've reached a point where i think i've been staring at this too long, and just can';t see it.
Thanks in advance, S