I tried to run a Glue job in python-shell by adding external dependencies (like pyathena, pytest,etc ..) as python egg file/ whl file in the job configurations as mentioned in the AWS documentation https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html.
The Glue job is configured under VPC having no internet and its execution resulted in the below error.
WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.VerifiedHTTPSConnection object at 0x7fd05d6a4f28>, 'Connection to pypi.org timed out. (connect timeout=15)')'
I even tried modifying my python script with the below code
import os
import site
import importlib
from setuptools.command import easy_install
install_path = os.environ['GLUE_INSTALLATION']
libraries = ["pyathena"]
for lib in libraries:
easy_install.main( ["--install-dir", install_path , lib] )
importlib.reload(site)
On executing the above code i faced below error
Download error on https://pypi.org/simple/pyathena/: [Errno 99] Cannot assign requested address -- Some packages may not be found! Couldn't find index page for 'pyathena' (maybe misspelled?)
Can i have sample code snippet to generate an egg/whl file for external python packages and to add those part of Glue python-shell job