1
votes

I tried using PygreSQL modules

import pg
import pgdb

but it says the modules were not found when running on AWS Glue Spark.

Their Developer Guide, https://docs.aws.amazon.com/glue/latest/dg/glue-dg.pdf, says it's available for Python Shell though.

Can anyone else confirm this? Is there a page I can refer to for what libraries that come by default for the Python environment? Is there an alternative to a PostgreSQL library for running on Spark Glue jobs? I know it is possible to use an external library by importing into S3 and adding the path in the configurations but I would like to avoid as many manual steps as possible.

1
The document talks about python shell jobs. Are you using the same or Glue spark jobs?Prabhakar Reddy
what type of glue job you are using python shell or glue python ?Abdelrahman Maharek
Sorry, forgot to mention what type of Glue job. I want to run it on Spark. I've updated by question to reflect that.DmcZx

1 Answers

2
votes

The document that you have shared is talking about libraries only intended for python shell jobs. If you want this library in a Glue spark job then you need to package it then upload to s3 and import it in your Glue job.

There are alternatives like pg8000 which can also be used as external python library.This and this talks more about on how you can package it which can also be used with pygresql library.

Also this has more information on how you can connect to on-prem postgresql databases.