0
votes

I am exporting GA360 table from Big query to snowflake as json format using bq cli command. I am losing some fields when I load it as table in snowflake. I use the copy command to load my json data from GCS external stage in snowflake to snowflake tables. But, I am missing some fields that are part of nested array. I even tried compressing the file when I export to gcs but I still loose data. Can someone suggest me how I can do this. I don't want to flatten the table in bigquery and transfer that. My daily table size is minimum of 1.5GB to maximum of 4GB.

bq extract \
  --project_id=myproject \
  --destination_format=NEWLINE_DELIMITED_JSON \
  --compression GZIP \
  datasetid.ga_sessions_20191001 \
gs://test_bucket/ga_sessions_20191001-*.json

I have set up my integration, file format, and stage in snowflake. I copying data from this bucket to a table that has one variant field. The row count matches with Big query but the fields are missing. I am guessing this is due to the limit snowflake has where each variant column should be of 16MB. Is there some way I can compress each variant field to be under 16MB?

1
What fields are you missing?Felipe Hoffa
I am missing hits.transaction and all it's struct @FelipeHoffausersas

1 Answers

2
votes

I had no problem exporting GA360, and getting the full objects into Snowflake.

First I exported the demo table bigquery-public-data.google_analytics_sample.ga_sessions_20170801 into GCS, JSON formatted.

Then I loaded it into Snowflake:


create or replace table ga_demo2(src variant);

COPY INTO ga_demo2
FROM 'gcs://[...]/ga_sessions000000000000'
FILE_FORMAT=(TYPE='JSON');

And then to find the transactionIds:

SELECT src:visitId, hit.value:transaction.transactionId
FROM ga_demo1, lateral flatten(input => src:hits) hit
WHERE src:visitId='1501621191'
LIMIT 10

enter image description here

Cool things to notice:

  • I read the GCS files easily from Snowflake deployed in AWS.
  • JSON manipulation in Snowflake is really cool.

See https://hoffa.medium.com/funnel-analytics-with-sql-match-recognize-on-snowflake-8bd576d9b7b1 for more.