As glue is serverless, there is no /path/to/library/dir
.
python processes in glue need a couple of things to connect to external databases such as your oracle server.
python libraries must be packaged as an .egg
or .whl
, uploaded to s3, and the location of these files must be specified when creating a job (the field Python Library Path
). this applies to any library that you author or which you would normally pip install
but these are not available in the environment that aws provides for glue processes. So you'd need to build an .egg
for cx_Oracle
locally, upload to s3, and provide the path in Python Library Path
when creating the your job. If you have already created the job, you can edit the job and provide the s3-path-to-cs-oracle.egg
secrets such as connection credentials must be fetched from an secure external service by the etl script. One option is to store the oracle connection credentials in glue. From the aws glue console, go to connections, add a jdbc connection and save your database credentials.
In your etl script, use boto3.client('glue').get_connection to retrieve the connection details, and using the user uploaded cx_Oracle
library, connect to the database. Here's an example snippet that you would need to adapt & include in your etl script
snippet:
import boto3
import cx_Oracle as orcl
glue = boto3.client('glue')
resp = glue.get_connection(Name='my-oracle-connection')
props = resp['Connection']['ConnectionProperties']
dsn = props['JDBC_CONNECTION_URL'].split('//')[-1]
user = props['USERNAME']
pw = props['PASSWORD']
db = orcl.connect(user, pw, dsn)
#^ `db` should be a connection to your oracle db