AWS Glue psycopg2 installation

Question

I'm trying to run a code that uses psycopg2 to manipulate a Redshift instance. I have tried by importing a wheel file as I see they are supported in Glue python jobs. I see the library is installed in the endpoint when running but then I get an error:

import boto3
import psycopg2

Aug 4, 2020, 1:24:06 PM Pending execution
Processing ./glue-python-libs-92ng4pcb/psycopg2-2.8.5-cp36-cp36m-win_amd64.whl
Installing collected packages: psycopg2
Successfully installed psycopg2-2.8.5
Considering file without prefix as a python extra file s3://gluelibraries/boto3.zip

WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.

2020-08-04T13:24:44.831+02:00
Traceback (most recent call last):
  File "/tmp/runscript.py", line 123, in <module>
    runpy.run_path(temp_file_path, run_name='__main__')
  File "/usr/local/lib/python3.6/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/usr/local/lib/python3.6/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmp/glue-python-scripts-1t08aq9n/postloading.py", line 6, in <module>
  File "/glue/lib/installation/psycopg2/__init__.py", line 51, in <module>
    from psycopg2._psycopg import (                     # noqa
ModuleNotFoundError: No module named 'psycopg2._psycopg'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/runscript.py", line 142, in <module>
    raise e_type(e_value).with_traceback(new_stack)
  File "/tmp/glue-python-scripts-1t08aq9n/postloading.py", line 6, in <module>
  File "/glue/lib/installation/psycopg2/__init__.py", line 51, in <module>
    from psycopg2._psycopg import (                     # noqa
ModuleNotFoundError: No module named 'psycopg2._psycopg'

Theoretically Glue jobs in python (contrary to pyspark jobs) should support non pure python libraries

sumanth shetty sumanth shetty · Accepted Answer · 2020-08-04T11:42:16

I have faced the similar issue with psycopg2 package. It is to do with the compatibility with Python runtime that is accessing the psycopg2 module.

Follow this thread . Hope you'll have your solution. Using psycopg2 with Lambda to Update Redshift (Python)

AWS Glue psycopg2 installation

6 Answers