2
votes

I am trying to load JSON file from Staging area (S3) into Stage table using COPY INTO command.

Table:

create or replace TABLE stage_tableA  (
    RAW_JSON VARIANT NOT NULL
);

Copy Command:

copy into stage_tableA from @stgS3/filename_45.gz  file_format = (format_name = 'file_json')

Got the below error when executing the above (sample provided)

SQL Error [100069] [22P02]: Error parsing JSON: document is too large, max size 16777216 bytes If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.

When I had put "ON_ERROR=CONTINUE" , records got partially loaded, i.e until the record with more than max size. But no records after the Error record was loaded.

Was "ON_ERROR=CONTINUE" supposed to skip only the record that has max size and load records before and after it ?

3

3 Answers

1
votes

Yes, the ON_ERROR=CONTINUE skips the offending line and continues to load the rest of the file.

To help us provide more insight, can you answer the following:

  • How many records are in your file?
  • How many got loaded?
  • At what line was the error first encountered?

You can find this information using the COPY_HISTORY() table function

0
votes

Try setting the option strip_outer_array = true for file format and attempt the loading again.

The considerations for loading large size semi-structured data are documented in the below article:

https://docs.snowflake.com/en/user-guide/semistructured-considerations.html

0
votes

I partially agree with Chris. The ON_ERROR=CONTINUE option only helps if the there are in fact more than 1 JSON objects in the file. If it's 1 massive object then you would simply not get an error or the record loaded when using ON_ERROR=CONTINUE.

If you know your JSON payload is smaller than 16mb then definitely try the strip_outer_array = true. Also, if your JSON has a lot of nulls ("NULL") as values use the STRIP_NULL_VALUES = TRUE as this will slim your payload as well. Hope that helps.