Error Connecting to Redshift from Spark on Databricks

Question

I am trying to connect to Redshift from Spark (running on Databricks)

from pyspark.sql import SQLContext

sc._jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", ACCESS_KEY)
sc._jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", SECRET_KEY)

# IP addresses from Redshift Security Group panel
IP_ADDRESSES_TO_ADD = ["1.2.3.4/32", "5.6.7.8/32"]
PORTS_TO_ADD = ["80", "443"]
PROTOCOLS_TO_ADD = ["tcp"]

# Read data from a query
df = sqlContext.read \
    .format("com.databricks.spark.redshift") \
    .option("url", "jdbc:redshift://XXX.XXX.eu-west-1.redshift.amazonaws.com:5439/REDSHIFT_DB?user=REDSHIFT_USER&password=REDSHIFT_PW&ssl=true&sslfactory=com.amazon.redshift.ssl.NonValidatingFactory") \
    .option("query", "select * FROM REDSHIFT_TABLE LIMIT 10") \
    .option("tempdir", "s3n://path/to/temp/") \
    .load()

However I am getting the following error:

java.sql.SQLException: [Amazon](500150) Error setting/closing connection: Connection timed out.

Am I missing something?

Check ingress rules of the redshift, and make sure it allows access from ip where you running script. — Vor

Rishi Rishi · Accepted Answer · 2017-05-05T09:06:10

It looks like a connection error. Please verify whether you are authorized user or not.

To verify this : Run below command :

telnet XXX.XXX.eu-west-1.redshift.amazonaws.com 5439

You should get something like this (If you are authorized user):

Trying <IP address>...
Connected to <Host name>.
Escape character is '^]'.

But if you will get : connection time out, it means you are not authorized user.

Error Connecting to Redshift from Spark on Databricks

2 Answers