Is it possible to use AWS Glue Connection to create a data source?

Question

I'm trying to access a database in the private subnet in the AWS Glue job script. As far as see in the documentation, one can create a data source using different "connection types" and appropriate "connection options", but they don't support VPC settings.

The only thing, which supports VPC settings is AWS Glue Connection, but I cannot find a way how to create a Spark data source using AWS Glue Connection.

Or maybe there is a some workaround?

ya2410 ya2410 · Accepted Answer · 2019-06-26T22:28:47

See step 8 in this guide, after you add your Glue jdbc connection, create a crawler to import table metadata from the source database into the AWS Glue Data Catalog.

Then you can access the table within a Glue job like this:

df = glueContext.create_dynamic_frame.from_catalog(database = "db1", table_name = "table1")

Or with Spark:

df = spark.sql("SELECT * FROM db1.table1")

Is it possible to use AWS Glue Connection to create a data source?

1 Answers