0
votes

I'm trying to access a database in the private subnet in the AWS Glue job script. As far as see in the documentation, one can create a data source using different "connection types" and appropriate "connection options", but they don't support VPC settings.

The only thing, which supports VPC settings is AWS Glue Connection, but I cannot find a way how to create a Spark data source using AWS Glue Connection.

Or maybe there is a some workaround?

1

1 Answers

1
votes

See step 8 in this guide, after you add your Glue jdbc connection, create a crawler to import table metadata from the source database into the AWS Glue Data Catalog.

Then you can access the table within a Glue job like this:

df = glueContext.create_dynamic_frame.from_catalog(database = "db1", table_name = "table1")

Or with Spark:

df = spark.sql("SELECT * FROM db1.table1")