I am trying to connect to Redshift and run simple queries from a Glue DevEndpoint (that is requirement) but can not seems to connect.
Following code just times out:
df = spark.read \
.format('jdbc') \
.option("url", "jdbc:redshift://my-redshift-cluster.c512345.us-east-2.redshift.amazonaws.com:5439/dev?user=myuser&password=mypass") \
.option("query", "select distinct(tablename) from pg_table_def where schemaname = 'public'; ") \
.option("tempdir", "s3n://test") \
.option("aws_iam_role", "arn:aws:iam::147912345678:role/my-glue-redshift-role") \
.load()
What could be the reason?
I checked URL, user, password and also tried different IAM roles but every time just hangs..
Also tried without IAM role (just having URL, user/pass, schema/table that already exists there) and also hangs/timeout:
jdbcDF = spark.read \
.format("jdbc") \
.option("url", "jdbc:redshift://my-redshift-cluster.c512345.us-east-2.redshift.amazonaws.com:5439/dev") \
.option("dbtable", "public.test") \
.option("user", "myuser") \
.option("password", "mypass") \
.load()
Reading data (directly in Glue SSH terminal) from S3 or from Glue tables (catalog) seems fine so I know that Spark and Dataframes are fine, just there is something with connection to RedShift but not sure what?