4
votes

So, if you have this table in Snowflake:

create table t (x string, y string) as select '', null;

and you copy it to an external stage with file_format csv, you get this error if you don't set field_optionally_enclosed_by to something other than none:

Cannot unload empty string without file format option field_optionally_enclosed_by being specified.

so, let's say it's set to '"'.

create stage some_stg
url='s3://<some-bucket>/<some-dir>'
file_format = (type = csv field_optionally_enclosed_by='"' compression = none)
credentials = (aws_role = '<your-arn-for-snowflake>')

I'm sure this issue reproduces with an internal stage if you don't want to mess with getting snowflake to use your s3 bucket.

When you run a copy for table t above:

copy into @some_stg/t.csv from t overwrite = true;

you get a file (t_0_0_0.csv) that looks like this:

"","\N"

And after creating the equivalent table in postgres:

create table t (x varchar, y varchar);

When you load that into postgres with psql copy like this:

psql -h <host> -U <user> -c "copy t from stdin with csv null '\N'" < t_0_0_0.csv

The contents of t on postgres is:

x, y
"","\N"

Now this make sense because snowflake put the \N in double quotes, so the psql copy preserved it. If you edit t_0_0_0.csv and remove the double quotes around the \N:

"",\N

And run psql copy again then the \N is correctly converted to null

There does not appear to be a way to generate a csv file from snowflake that supports empty string and null that can be preserved loading into postgres. I messed with the snowflake configs EMPTY_FIELD_AS_NULL and NULL_IF which in snowflake's documentation it even speaks to this issue:

When unloading empty string data from tables, choose one of the following options:

Preferred: Enclose strings in quotes by setting the FIELD_OPTIONALLY_ENCLOSED_BY option, to distinguish empty strings from NULLs in output CSV files.

It does "distinguish" them but not in a way that psql copy can use without manipulating the file with sed beforehand.

Does anyone know how to generate a snowflake csv that preserves empty strings and nulls in a way that psql copy can reproduce?

1

1 Answers

2
votes

Did you tried with NULL_IF option in your file format, Following file format will unload your snowflake null data with empty.

CREATE OR REPLACE FILE FORMAT UPDATED_FORMAT_NAME
TYPE = 'CSV'
COMPRESSION = 'NONE'
FIELD_DELIMITER =','
NULL_IF=()