AWS Glue read from RDS database that's in VPC

Question

I have an RDS database that is sitting in a VPC. My ultimate goal is to run a nightly job that takes the data from RDS and stores it in Redshift. I am currently doing this using Glue and Glue connections. I am able to write to RDS/Redshift using connections with the following line:

datasource2 = DynamicFrame.fromDF(dfFinal, glueContext, "scans")

output = glueContext.write_dynamic_frame.from_jdbc_conf(frame = datasource2, catalog_connection = "MPtest", connection_options = {"database" : "app", "dbtable" : "scans"})

Where dfFinal is my final data frame after a bunch of transformations that are not essential to this post. That code works fine, however I would like to modify it so I could read a table from RDS into a data frame.

Since the RDS database is in a VPC, I would like to use the catalog_connection parameter, but the DynamicFrameReader class has no from_jdbc_conf method and thus no obvious way to use my glue connection.

I have seen posts that say you could use a method like this:

url = "jdbc:postgresql://host/dbName"
properties = {
"user" : "user",
"password" : "password"
}
df = spark.read.jdbc(url=url, table="table", properties=properties)

But when I try that it times out because it's not a publicly accessible database. Any suggestions?

Romi kukreja Romi kukreja · Accepted Answer · 2020-11-14T02:09:42

You are on the right track with using Glue connections.

Define Glue connection of Type JDBC for your Postgres instance

Type    JDBC

JDBC URL    jdbc:postgresql://<RDS ip>:<RDS port>/<database_name>

VPC Id  <VPC of RDS instance>

Subnet  <subnet of RDS instance>

Security groups <Security Group allowed to connect to RDS>

Edit Glue Job, and select the Glue Connection so it appears under "Required Connections"
Create connections options dictionary as


options = {'url': connection.jdbc_url,

           'user': connection.username,

           'password': connection.password,

           'dbtable': table
}

Use the options dictionary in the below to create a Dynamic frame to read from the table


table_ddf = glueContext.create_dynamic_frame.from_options(

        connection_type='postgresql',

        connection_options=options,

        transformation_ctx=transformation_ctx
)

AWS Glue read from RDS database that's in VPC

1 Answers