1
votes

I'm trying to connect to snowflake via spark in a jupyterhub notebook and unable to do so because I cannot load the JDBC connector per snowflake's documentation : https://docs.snowflake.net/manuals/user-guide/spark-connector-use.html#using-the-connector-with-python I have been able to install the connector snowflake-connector-python

Is there a good way to start the jupyter kernel with the JDBC connector ? Here's the code i'm trying to run, it's copy pasted from snowflake's documentation:

from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *
from pyspark import SparkConf, SparkContext

sc = SparkContext("local", "Simple App")
spark = SQLContext(sc)
spark_conf = SparkConf().setMaster('local').setAppName('<YOUR_APP_NAME>')

# Set options below
sfOptions = {
  "sfURL" : "<account_name>.snowflakecomputing.com",
  "sfAccount" : "<account_name>",
  "sfUser" : "<user_name>",
  "sfPassword" : "<password>",
  "sfDatabase" : "<database>",
  "sfSchema" : "<schema>",
  "sfWarehouse" : "<warehouse>",
}

SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"

df = spark.read.format(SNOWFLAKE_SOURCE_NAME) \
  .options(**sfOptions) \
  .option("query",  "select * from MYTABLE) \
  .load()

df.show()
1
Try loading this way :spark = SparkSession \ .builder \ .config("spark.jars", "file://<path-to>/snowflake-jdbc-3.8.0.jar,file://<path-to>/spark-snowflake_2.11-2.4.13-spark_2.4.jar") \ .config("spark.repl.local.jars", "file://<path-to>/snowflake-jdbc-3.8.0.jar,file://<path-to>/spark-snowflake_2.11-2.4.13-spark_2.4.jar") \ .config("spark.sql.catalogImplementation", "in-memory") \ .getOrCreate()Ankur Srivastava
Are you getting any error messages from the code above?Rachel McGuigan

1 Answers

0
votes

Try loading like this way :

spark = SparkSession \ .builder \ .config("spark.jars", "file:///snowflake-jdbc-3.8.0.jar,file:///spark-snowflake_2.11-2.4.13-spark_2.4.jar") \ .config("spark.repl.local.jars", "file:///snowflake-jdbc-3.8.0.jar,file:///spark-snowflake_2.11-2.4.13-spark_2.4.jar") \ .config("spark.sql.catalogImplementation", "in-memory") \ .getOrCreate()

Load the in memory jars if you havent set in the classpath,else it will not find the required jars and will throw Error.The other way is to set the classpath with the both the above jars .