0
votes

Fairly new to Snowflake but have loaded data successfully via various methods.

Problem: I am calling a Web API via Azure Data Factory using the pagination rules to write to a single JSON file in blob storage. This is done using multiple calls via single ADF data copy activity utilising the AbsoluteURL to merge to a single file (I could write extra, complex ADF logic to generate multiple files, but for this example want to go with a single file approach). This is really efficient in my ADF pipeline and I can either write as an array or set of objects.

The ADF part works well, but I am having trouble parsing the "merged" JSON file when loading into a table of rows in Snowflake via a stage.

The output JSON file called "setOfObjects.json" looks like this for ADF JSON sink setting: setOfObjects (JSON lines):

{"values":[{"a":"1","b":"1","c":"1"},{"a":"2","b":"2","c":"2"}]}
{"values":[{"a":"3","b":"3","c":"3"},{"a":"4","b":"4","c":"4"}]}
<empty line>

In Snowflake, I create a stage such as

create or replace stage dbo.stage_json_example url = 'azure://<endpoint>.blob.core.windows.net/test/stackOverflow/'
credentials = (azure_sas_token = '<sas_token>')
file_format = (type = 'json');

Then in Snowflake, I try to parse the JSON file such as, but it returns NULL value:

select $1:values:a::string
from @dbo.stage_json_example/setOfObjects.json;

From above, I want to convert the single JSON file into 4 rows being:

a b c
1 1 1
2 2 2
3 3 3
4 4 4

Any suggestions are appreciated. Thanks

2
Please include the code you used to define @dbo.stage_json_example. Also, try to have only one JSON format to parse - check ADF docs.microsoft.com/en-us/azure/data-factory/format-json to set the output to JSON Lines (the default) - Felipe Hoffa
Yes I have tried both setOfObjects (JSON Lines),which is the "Set of Object" example above, as well as arrayOfObjects which is the "Set of Array" example above. Have also tried all of the "Copy Behaviour" options which do not change the JSON file format. Snowflake stage is standard and I have different stage options for both strip_outer_array values. create or replace stage dbo.stage_json_example url = '<url>' credentials = (azure_sas_token = '<sas_token>') file_format = (type = 'json',strip_outer_array = <true/false>); - TonyGaul
For simplicity of reproducing the problem - can you give me one file example, and one setup example? I'm confused by having the 2 alternatives in the question. If I can reproduce for one file, then I can help you solve the issue. - Felipe Hoffa
@FelipeHoffa Hi, I figured out the problem by looking through a bunch of Snowflake articles. Thanks for your help. - TonyGaul

2 Answers

1
votes

OK, found the answer from this link https://community.snowflake.com/s/article/json-data-parsing-in-snowflake

This Snowflake code solved the problem.

select t.value:a::int as "a", t.value:b::int as "b", t.value:c::int  as "c"
from @dbo.stage_json_example/setOfObjects.json as S
, table(flatten(S.$1,'values')) t;
0
votes

Perhabs you try to write a { at the beginning and } at the end of your document. JSON always needs a document root. With this change you have one.

so your Example would looks like:

{
 {"values":[{"a":"1","b":"1","c":"1"},{"a":"2","b":"2","c":"2"}]}
 {"values":[{"a":"3","b":"3","c":"3"},{"a":"4","b":"4","c":"4"}]} 
}