0
votes

I want to use the Snowflake Spark Connector to export data from a client's Snowflake instance.

The issue I am having is that the account that the client has shared with me only has Reader Access, therefore I am unable to use the Snowflake Spark connector because my job fails during the Stage Creation Step as I don't have the rights to create an Internal Stage on the client's Snowflake instance.

I found from this Blog (Step 4, Configuration of the staging area for the connector in AWS S3) that you can configure an External Stage Location, which for example can be my own account. So I would not require any additional access on the client's Snowflake instance.

Only issue is that I use Google Cloud Storage and not AWS S3. I cannot find documentation explaining how to use Google Buckets as an External Storage.

Here I find docs on how to provide custom AWS Credentials. Which says that I need to provide the following parameters:

  • awsAccessKey
  • awsSecretKey
  • tempdir

I need help in figuring out What options are to be configured to use Google Cloud Storage as an External Stage Location.

2

2 Answers

0
votes

Have you already seen: https://docs.snowflake.net/manuals/user-guide/data-load-gcs.html https://docs.snowflake.net/manuals/user-guide/data-load-considerations.html and https://docs.snowflake.net/manuals/user-guide/data-load-prepare.html

From what I gathered, you do have to sign up for the Preview Feature in order to enable it on your Snowflake account.

  1. set up the Integration for Google Cloud Storage,
  2. then copy they data from the bucket or @stage file

For more details check out: https://docs.snowflake.net/manuals/user-guide/data-load-gcs.html

Let me know how it goes, I am happy to follow up!

Other useful questions:

Load data from bucket google cloud

0
votes

While Apache Hadoop (and by extension, Apache Spark) cloud storage connectors now support Google Cloud Storage via a gs:// URL and associated configuration, Snowflake's Spark Connector does not yet support use of Google Cloud for cloud storage operations at the time of posting this.