AWS Glue Python Shell connection with Oracle

Question

While running AWS Glue Python shell (not using Spark) I want to connect with Oracle. I was successful doing all the stuff (described in the link below) in dev_endpoint or in my virtual machine, but my goal is to have it AWS Glue Python Shell. Connection with Oracle cx_Oracle problem with AWS Glue Python Shell

All the libraries in AWS Glue Python Shell must come as .whl or .egg packages - then they are installed. But AWS Glue is serverless and I wasn't able to find where they were installed - so that I could set up rpath correctly.

How to know absolute_path_to_library_dir?

Haleemur Ali Haleemur Ali · Accepted Answer · 2020-04-15T03:36:14

As glue is serverless, there is no /path/to/library/dir.

python processes in glue need a couple of things to connect to external databases such as your oracle server.

python libraries must be packaged as an .egg or .whl, uploaded to s3, and the location of these files must be specified when creating a job (the field Python Library Path). this applies to any library that you author or which you would normally pip install but these are not available in the environment that aws provides for glue processes. So you'd need to build an .egg for cx_Oracle locally, upload to s3, and provide the path in Python Library Path when creating the your job. If you have already created the job, you can edit the job and provide the s3-path-to-cs-oracle.egg
secrets such as connection credentials must be fetched from an secure external service by the etl script. One option is to store the oracle connection credentials in glue. From the aws glue console, go to connections, add a jdbc connection and save your database credentials.
In your etl script, use boto3.client('glue').get_connection to retrieve the connection details, and using the user uploaded cx_Oracle library, connect to the database. Here's an example snippet that you would need to adapt & include in your etl script

snippet:

import boto3
import cx_Oracle as orcl

glue = boto3.client('glue')
resp = glue.get_connection(Name='my-oracle-connection')
props = resp['Connection']['ConnectionProperties']
dsn = props['JDBC_CONNECTION_URL'].split('//')[-1]
user = props['USERNAME']
pw = props['PASSWORD']
db = orcl.connect(user, pw, dsn)
#^ `db` should be a connection to your oracle db

AWS Glue Python Shell connection with Oracle

1 Answers