Connect to snowflake in a jupyterhub notebook with python and pyspark

Question

I'm trying to connect to snowflake via spark in a jupyterhub notebook and unable to do so because I cannot load the JDBC connector per snowflake's documentation : https://docs.snowflake.net/manuals/user-guide/spark-connector-use.html#using-the-connector-with-python I have been able to install the connector snowflake-connector-python

Is there a good way to start the jupyter kernel with the JDBC connector ? Here's the code i'm trying to run, it's copy pasted from snowflake's documentation:

from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *
from pyspark import SparkConf, SparkContext

sc = SparkContext("local", "Simple App")
spark = SQLContext(sc)
spark_conf = SparkConf().setMaster('local').setAppName('<YOUR_APP_NAME>')

# Set options below
sfOptions = {
  "sfURL" : "<account_name>.snowflakecomputing.com",
  "sfAccount" : "<account_name>",
  "sfUser" : "<user_name>",
  "sfPassword" : "<password>",
  "sfDatabase" : "<database>",
  "sfSchema" : "<schema>",
  "sfWarehouse" : "<warehouse>",
}

SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"

df = spark.read.format(SNOWFLAKE_SOURCE_NAME) \
  .options(**sfOptions) \
  .option("query",  "select * from MYTABLE) \
  .load()

df.show()

Try loading this way :spark = SparkSession \ .builder \ .config("spark.jars", "file://<path-to>/snowflake-jdbc-3.8.0.jar,file://<path-to>/spark-snowflake_2.11-2.4.13-spark_2.4.jar") \ .config("spark.repl.local.jars", "file://<path-to>/snowflake-jdbc-3.8.0.jar,file://<path-to>/spark-snowflake_2.11-2.4.13-spark_2.4.jar") \ .config("spark.sql.catalogImplementation", "in-memory") \ .getOrCreate() — Ankur Srivastava

Ankur Srivastava Ankur Srivastava · Accepted Answer · 2020-01-17T18:21:53

Try loading like this way :

spark = SparkSession \ .builder \ .config("spark.jars", "file:///snowflake-jdbc-3.8.0.jar,file:///spark-snowflake_2.11-2.4.13-spark_2.4.jar") \ .config("spark.repl.local.jars", "file:///snowflake-jdbc-3.8.0.jar,file:///spark-snowflake_2.11-2.4.13-spark_2.4.jar") \ .config("spark.sql.catalogImplementation", "in-memory") \ .getOrCreate()

Load the in memory jars if you havent set in the classpath,else it will not find the required jars and will throw Error.The other way is to set the classpath with the both the above jars .

Connect to snowflake in a jupyterhub notebook with python and pyspark

1 Answers