0
votes

I've ran my Python script locally and it works. I intend to do a simple connection to a PostgreSQL database and perform a query. For that I need to import the following python modules :

import pandas as pd import pandas.io.sql as psql import boto3 import psycopg2 as pg

The first three, AWS Glue Job already has available. The psycopg2 has to be imported as an .egg file as demonstrated on https://www.helicaltech.com/external-python-libraries-aws-glue-job/, and later made available in an S3 bucket.

So, I created my AWS Glue Job with my simple script and added the .egg file as my Python library path. When the code runs, it recognizes the first three modules. The latter, opens with success the .egg file. However, when it reached the init.py file, it crashes against the first thing it sees which is the :

from psycopg2._psycopg import (...).

What am i doing wrong? Should the init.py be empty? I've tried with other libraries and it always crashes on the first line of the init.py

1
I see that the file _psycopg.pyd is the one being called and not being found. Can it be that AWS is not able to read this type of file?DMBorges

1 Answers

1
votes

AWS glue doesn't supports egg files. Instead create a zip file for the python libraries that you want to use. As documented by AWS Glue :

Unless a library is contained in a single .py file, it should be packaged in a .zip archive

Also AWS Glue supports only pure python modules. Refer the following quote from aws documentation

You can use Python extension modules and libraries with your AWS Glue ETL scripts as long as they are written in pure Python. C libraries such as pandas are not supported at the present time, nor are extensions written in other languages.

Reference : Using python libraries with AWS Glue