2
votes

I have just working in a cloud database project involving some big data processing and need to quickly migrate data directly from bigquery to snowflake. I was thinking if there is any direct way to push data from bigquery to snowflake without any intermediate storage.

Please let me know there is a utility or api or anything other way bigquery provides to push data to snowflake. Appreciate your help.

2
I highly recommend using the intermediate storage approach. It is faster to unload from BigQuery and faster to load to Snowflake with that intermediate step.Mike Walton

2 Answers

4
votes

If this is a one-time need, you can perform it in two steps, exporting the table data to Google's Cloud Storage and then bulk loading from the Cloud Storage onto Snowflake.

For the export stage, use BigQuery's inbuilt export functionality or the BigQuery Storage API if the data is very large. All of the export formats supported by BigQuery (CSV, JSON or AVRO) are readily supported by Snowflake, so a data transformation step may not be required.

Once the export's ready on the target cloud storage address, use Snowflake's COPY INTO <table> with the external location option (or named external stages) to copy them into a Snowflake-managed table.

2
votes

If you want to move all your data to snowflake from bigquery, so I have few assumption 1. You have all your data in gcs itself. 2. You can connect the snowflake cluster from gcp.

Now your problem shrinks down to moving your data from gcs bucket to snowflake directly. SO now you can run copy command like

COPY INTO [<namespace>.]<table_name>
     FROM { internalStage | externalStage | externalLocation }
[ FILES = ( '<file_name>' [ , '<file_name>' ] [ , ... ] ) ]
[ PATTERN = '<regex_pattern>' ]
[ FILE_FORMAT = ( { FORMAT_NAME = '[<namespace>.]<file_format_name>' |
                    TYPE = { CSV | JSON | AVRO | ORC | PARQUET | XML } [ formatTypeOptions ] } ) ]
[ copyOptions ]
[ VALIDATION_MODE = RETURN_<n>_ROWS | RETURN_ERRORS | RETURN_ALL_ERRORS ]

where external location is

externalLocation ::=
  'gcs://<bucket>[/<path>]'
  [ STORAGE_INTEGRATION = <integration_name> ]
  [ ENCRYPTION = ( [ TYPE = 'GCS_SSE_KMS' ] [ KMS_KEY_ID = '<string>' ] | [ TYPE = NONE ] ) ]

The documentation for the same can be found here and here