0
votes

using AWS Glue, I've created a job Glue version 2.0 with the parameters --additional-python-modules = psycopg2-binary and --python-modules-installer-option = --upgrade but it fails to import the module.

com.amazonaws.services.glue.PythonModuleInstaller [main] Collecting psycopg2-binary Downloading https://files.pythonhosted.org/packages/14/65/223a5b4146b1d5d5ab66f16ef194916a1ed9720da1f118d7bfb60b8f2bea/psycopg2-binary-2.9.1.tar.gz (380kB) Complete output from command python setup.py egg_info: running egg_info creating pip-egg-info/psycopg2_binary.egg-info writing pip-egg-info/psycopg2_binary.egg-info/PKG-INFO writing dependency_links to pip-egg-info/psycopg2_binary.egg-info/dependency_links.txt writing top-level names to pip-egg-info/psycopg2_binary.egg-info/top_level.txt writing manifest file 'pip-egg-info/psycopg2_binary.egg-info/SOURCES.txt' Error: pg_config executable not found. pg_config is required to build psycopg2 from source. Please add the directory containing pg_config to the $PATH or specify the full executable path with the option: python setup.py build_ext --pg-config /path/to/pg_config build ... or with the pg_config option in 'setup.cfg'. If you prefer to avoid building psycopg2 from source, please install the PyPI 'psycopg2-binary' package instead. For further information please check the 'doc/src/install.rst' file (also at <https://www.psycopg.org/docs/install.html>).

What could be the cause?

2
I don't expect that you need the parameter --python-modules-installer-option = --upgrade, have you tried it without that? - jonlegend
Yes I tried without --upgrade, same behavior. Using pg8000 it is imported without any problems - Yousra

2 Answers

1
votes

Try to use an older version of psycopg2-binary. I had this same issue. So it would look like this --additional-python-modules | psycopg2-binary==2.8.6

0
votes

There are multiple approaches in order to develop GLUE jobs :

  1. Spark with Scala
  2. Spark with Python
  3. Python Shell
  4. Spark streaming

In our scenario we were using python shell for development purposes and faced similar issues . Resolution : Please try creating .egg file using Python version 3.6 [ currently .egg / .wheel created by v3.6 is only supported ] and place it in s3 folder and you can add reference to same file in Glue job parameters .

For reference you can refer : https://helicaltech.com/external-python-libraries-aws-glue-job/ https://www.blog.pythonlibrary.org/2012/07/12/python-101-easy_install-or-how-to-create-eggs/

If you are using Spark shell please try creating Zip of library and follow the similar steps as suggested above . In addition you can refer :

https://aws.amazon.com/premiumsupport/knowledge-center/glue-job-use-external-python-libraries/