I am trying to connect to snowflake from Pyspark on my local machine.
My code looks as below.
from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *
from pyspark import SparkConf, SparkContext
sc = SparkContext("local", "sf_test")
spark = SQLContext(sc)
spark_conf = SparkConf().setMaster('local').setAppName('sf_test')
sfOptions = {
"sfURL" : "someaccount.some.address",
"sfAccount" : "someaccount",
"sfUser" : "someuser",
"sfPassword" : "somepassword",
"sfDatabase" : "somedb",
"sfSchema" : "someschema",
"sfWarehouse" : "somedw",
"sfRole" : "somerole",
}
SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"
I get an error when I run this particular chunk of code.
df = spark.read.format(SNOWFLAKE_SOURCE_NAME).options(**sfOptions).option("query","""select * from
"PRED_ORDER_DEV"."SALES"."V_PosAnalysis" pos
ORDER BY pos."SAPAccountNumber", pos."SAPMaterialNumber" """).load()
Py4JJavaError: An error occurred while calling o115.load. : java.lang.ClassNotFoundException: Failed to find data source: net.snowflake.spark.snowflake. Please find packages at http://spark.apache.org/third-party-projects.html at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
I have loaded the connector and jdbc jar files and added them to CLASSPATH
pyspark --packages net.snowflake:snowflake-jdbc:3.11.1,net.snowflake:spark-snowflake_2.11:2.5.7-spark_2.4
CLASSPATH = C:\Program Files\Java\jre1.8.0_241\bin;C:\snowflake_jar
I want to be able to connect to snowflake and read data with Pyspark. Any help would be much appreciated!