Create dynamic frame from options (from rds - mysql) providing a custom query with where clause

Question

I want to create a DynamicFrame in my Glue job from an Aurora-rds mysql table. Can I create DynamicFrame from my rds table using a custom query - having a where clause? I dont want to read the entire table every time in my DynamicFrame and then filter later. Looked at this website but didnt find any option here or elsewhere, https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect.html

Construct JDBC connection options

connection_mysql5_options = { "url": "jdbc:mysql://:3306/db", "dbtable": "test", "user": "admin", "password": "pwd"}

Read DynamicFrame from MySQL 5

df_mysql5 = glueContext.create_dynamic_frame.from_options(connection_type="mysql", connection_options=connection_mysql5_options)

Is there any way to give a where clause and say select only top 100 rows from test table, say it has a column named "id" and I want to fetch using this query:

select * from test where id<100;

Appreciate any help. Thank you!

I found this today, and probably is what you are looking for too stackoverflow.com/questions/51388993/… — tubadc

Guillermo AMS Guillermo AMS · Accepted Answer · 2021-03-30T15:05:17

The way I was able to provide a custom query was by creating a Spark DataFrame and specifying it with options: https://spark.apache.org/docs/2.3.0/sql-programming-guide.html#manually-specifying-options

Then transform that DataFrame into a DynamicFrame using said class: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html

tmp_data_frame = spark.read.format("jbdc")
.option("url", jdbc_url)
.option("user", username)
.option("password", password)
.option("query", "select * from test where id<100")
.load()

dynamic_frame = DynamicFrame.fromDF(tmp_data_frame, glueContext)

Create dynamic frame from options (from rds - mysql) providing a custom query with where clause

2 Answers