0
votes

I ingest data into a table source_table with AVRO data. There is a column in this table say "avro_data" which will be populated with variant data. I plan to copy data into a structured table target_table where columns have the same name and datatype as the avro_data fields in the source table. Example: select avro_data from source_table

{"C1":"V1", "C2", "V2"}

This will result in

select * from target_table
------------
| C1  | C2 |
------------
| V1  | V2 |
------------

My question is when schema of the avro_data evolves and new fields get added, how can I keep schema of the target_table in sync by adding equivalent columns in the target table?

Is there anything out of the box in snowflake to achieve this or if someone has created any code to do something similar?

1

1 Answers

0
votes

Here's something to get you started. It shows how to take a variant column and parse out the internal columns. This uses a table in the Snowflake sample data database, which is not always the same. You can to adjust the table name and column name.

SELECT DISTINCT regexp_replace(regexp_replace(f.path,'\\\\[(.+)\\\\]'),'(\\\\w+)','\"\\\\1\"')                      AS path_name,       -- This generates paths with levels enclosed by double quotes (ex: "path"."to"."element").  It also strips any bracket-enclosed array element references (like "[0]")
                DECODE (substr(typeof(f.value),1,1),'A','ARRAY','B','BOOLEAN','I','FLOAT','D','FLOAT','STRING')     AS attribute_type,  -- This generates column datatypes of ARRAY, BOOLEAN, FLOAT, and STRING only
                REGEXP_REPLACE(REGEXP_REPLACE(f.path, '\\\\[(.+)\\\\]'),'[^a-zA-Z0-9]','_')                         AS alias_name       -- This generates column aliases based on the path
FROM
        "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF1"."JCUSTOMER",
        LATERAL FLATTEN("CUSTOMER", RECURSIVE=>true) f
WHERE   TYPEOF(f.value) != 'OBJECT'
        AND NOT contains(f.path, '[');

This is a snippet of code modified from here: https://community.snowflake.com/s/article/Automating-Snowflake-Semi-Structured-JSON-Data-Handling. The blog author attributes credit to a colleague for this section of code.

While the current incarnation of the stored procedure will create a view from the internal columns in a variant, an alternate version could create and/or alter a table to keep it in sync with changes.