0
votes

I am trying to connect to Snowflake from EMR cluster using pyspark.

I am using these two jars in spark-submit.

  • snowflake-jdbc-3.5.2.jar
  • spark-snowflake_2.11-2.7.0-spark_2.4.jar

But it failing with connect time out error. I have correct proxy configured for the EMR cluster. From the same EC2 (EMR Master) I am able to connect to Snowflake using snowsql and python connector.

I am not sure why it is getting timed out for pyspark.

1
can you share code snippet you are following ? some times due to proxy issues it may not connect. like http_proxy , https_proxy , HTTP_PROXY HTTPS_PROXY no_proxy settings needs to be used in such casesRam Ghadiyaram
SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake" sfOptions = {"sfURL": "XXX", "sfAccount": "XX", "sfUser": "XX", "sfPassword": "xx", "sfDatabase": "xx", "sfSchema": "xx", "sfWarehouse": "xx"} query = "select * from testdb.test1.t1" df = spark.read.format(SNOWFLAKE_SOURCE_NAME).options(**sfOptions).option("query", query).load()Ankit Patel CONT
I am able to connect to snowflake using snowsql and python connector from the same ec2 instance.Ankit Patel CONT
have you checked co proxy as mentioned above ? remaining all looks good also you need to use biz_pstage_work as schema for sprak to connect to sfRam Ghadiyaram
can you post the error stacktrace here ? Below were my commands when I tried running thru EMR shell.(some older versions of jar ) spark-submit --packages net.snowflake:snowflake-jdbc:3.8.0,net.snowflake:spark-snowflake_2.11:2.4.14-spark_2.4 SparkConnPython.py pyspark --packages net.snowflake:snowflake-jdbc:3.8.0,net.snowflake:spark-snowflake_2.11:2.4.14-spark_2.4Ankur Srivastava

1 Answers

0
votes

You can use our SnowCD tool to check the connectivity diagnostics. This is related to network issues. https://docs.snowflake.com/en/user-guide/snowcd.html

Below were my commands when I tried running through EMR shell.

pyspark --packages net.snowflake:snowflake-jdbc:3.6.27,net.snowflake:spark-snowflake_2.12:2.4.14-spark_2.4

spark-submit --packages net.snowflake:snowflake-jdbc:3.8.0,net.snowflake:spark-snowflake_2.11:2.4.14-spark_2.4 SparkConnPythonWithCert.py

Spark-shell --packages net.snowflake:snowflake-jdbc:3.8.0,net.snowflake:spark-snowflake_2.11:2.4.14-spark_2.4