1
votes

My company are in the process of migrating from a local DB to a Data Warehouse as the load is too much for SQL Server at the moment. We've looked at what cloud solutions we might be able to use and decided on Snowflake. We need to process quite heavy compressed JSONs of up to 300 MB per file at times. I've read through the documentation, created the stage, file format and table as (json_data variant). I've loaded my first JSON file using SnowSQL CLI and that worked too. The test file is 3.7 kb. When trying to copy into mytable I got this error

Error parsing JSON: document is too large, max size 16777216 bytes

How can I avoid this error without having to split the files before being uploaded in the Stage? The data is being sent by an app so every hour we'd have to load this data in when going live.

3
This error means that your copy into is likely trying to load the entire document into a single record. Could you provide a sample of the first few lines of a JSON file, as well as the COPY INTO statement you are executing. You might need to add some parameters to help Snowflake parse the file correctly.Mike Walton

3 Answers

1
votes

I would assume that error "Error parsing JSON: document is too large, max size 16777216 bytes" is caused because snowflake thinks your file has a single json record which practically does not look correct.

In JSON file format, use STRIP_OUTER_ELEMENT = TRUE .. this will turn your single JSON document array into individual JSON records.

Kindly try this and see if this works for you.


Update:

I have tested this and it works without any issues with way bigger single file than max size 16777216.

Steps

--On Worksheet

create or replace stage testdb.testschema.stage_java;

--On SnowSQL

put file://C:/Users/sonij/OneDrive/Desktop/bigjson.json @STAGE_JAVA AUTO_COMPRESS=false SOURCE_COMPRESSION=NONE OVERWRITE=TRUE;

--On Worksheet

create or replace file format bigjson_ff
    type = json
        COMPRESSION = none
        STRIP_OUTER_ARRAY = true
        ALLOW_DUPLICATE = TRUE;

select
    parse_json($1)
from @stage_java/bigjson.json (file_format => 'bigjson_ff');
0
votes

Splitting the files won't help here I'm afraid, as much as Snowflake recommends files from 10 to 100MB compressed for loading, it can handle bigger files as well.

The problem probably is with a single JSON record size (or something Snowflake thinks is a single JSON record).

You can try loading your file with "ON_ERROR=CONTINUE" to see if anything loads at all (if so, you'll know which row is problematic, if not, you'll know which rows are considered a single row)

0
votes

The maximum size of a VARIANT column is 16Mb so if any individual JSON "field" in each record (once you've applied STRIP_OUTER_ARRAY = true) is greater than this you won't be able to load it using a simple "copy into..." approach. I can think of 2 possible alternatives, neither as simple or as flexible as the COPY INTO statement but which will, at least, get your data into Snowflake:

  1. Use an ETL tool that can read JSON, extract all the individual fields and write the data to table(s)/columns

  2. Given that Snowflake enables SQL querying of JSON files in a Stage, insert the individual fields into tables/columns using an "INSERT INTO... SELECT..." statement rather than a "COPY INTO..." statement.