0
votes

I'm trying to connect to my PosgreSQL RDS directly from my AWS Glue script. I've tried connecting using the generated code and it works. But it doesn't work using JDBC type of connection. Here's the code:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.dynamicframe import DynamicFrame
import pyspark.sql.functions as F
from pyspark.sql.functions import *

## Initialize
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

df = spark \
     .read \
     .format('jdbc') \
     .option('url', 'jdbc:postgresql://host/database_name') \
     .option('dbtable', "(SELECT * FROM table WHERE name = 'abcd') AS t") \
     .option('user', 'username') \
     .option('password', 'password') \
     .load()

job.commit()

Part of the errors:

An error occurred while calling o74.load. : java.sql.SQLException: [Amazon](500150) Error setting/closing connection: Connection timed out. at com.amazon.redshift.client.PGClient.connect ....

Extra information:

  • It actually worked before, but I can't figure out what changed since then
  • I've tested the connections by using the "Test connection" in AWS Glue and it works
  • I've configured the VPC security group for the RDS to open inbound/outbound to the same security group (this is based on this guide: https://docs.aws.amazon.com/glue/latest/dg/setup-vpc-for-glue-access.html)

Thanks in advance and let me know if you need more information.

1
Can you try opening outbound traffic to everyone ? Also can you launch an EC2 machine in the VPC similar to your Glue job network config and confirm if you are able to connect? - Prabhakar Reddy
@PrabhakarReddy Thanks for commenting but apparently it's because of the port 🤦‍♂️ - Amree
yep I overlooked your script as you mentioned it ran fine before!! - Prabhakar Reddy

1 Answers

1
votes

I just found out the reason. It's because I didn't specify the port. I don't remember putting the port before. Everything works after that.

df = spark \
     .read \
     .format('jdbc') \
     .option('url', 'jdbc:postgresql://host:5432/database_name') \
     .option('dbtable', "(SELECT * FROM table WHERE name = 'abcd') AS t") \
     .option('user', 'username') \
     .option('password', 'password') \
     .load()