0
votes

Trying to run AWS Glue Python Shell Job but gives me Connect Timeout Error

Error Image : https://i.stack.imgur.com/MHpHg.png

Script : https://i.stack.imgur.com/KQxkj.png

2

2 Answers

0
votes

It looks like you didn't added secretsmanager endpoint to your VPC. As the traffic will not leave AWS network there will not be internet access inside your Glue job's VPC. So if you want to connect to secretsmanager then you need to add it to your VPC.

Refer to this on how you can add this to your VPC and this to make sure you have properly configured security groups.

0
votes

AWS Glue Git Issue

Hi, We got AWS Glue Python Shell working with all dependency as follows. The Glue has awscli dependency as well along with boto3

AWS Glue Python Shell with Internet

Add awscli and boto3 whl files to Python library path during Glue Job execution. This option is slow as it has to download and install dependencies.

  1. Download the following whl files
  1. Upload the files to s3 bucket in your given python library path
  2. Add the s3 whl file paths in the Python library path. Give the entire whl file s3 referenced path separated by comma

AWS Glue Python Shell without Internet connectivity

Reference: AWS Wrangler Glue dependency build

  1. We followed the steps mentioned above for awscli and boto3 whl files
  2. Below is the latest requirements.txt compiled for the newest versions
colorama==0.4.3
docutils==0.15.2
rsa==4.5.0
s3transfer==0.3.3
PyYAML==5.3.1
botocore==1.19.23
pyasn1==0.4.8
jmespath==0.10.0
urllib3==1.26.2
python_dateutil==2.8.1
six==1.15.0
  1. Download the dependencies to libs folder
pip download -r requirements.txt -d libs
  1. Move the original main whl files also to the lib directory
  1. Package as a zip file
cd libs zip ../boto3-depends.zip *
  1. Upload the boto3-depends.zip to s3 and add the path to Glue jobs Referenced files path Note: It is Referenced files path and not Python library path

  2. Placeholder code to install latest awcli and boto3 and load into AWS Python Glue Shell.

import os.path
import subprocess
import sys

# borrowed from https://stackguides.com/questions/48596627/how-to-import-referenced-files-in-etl-scripts
def get_referenced_filepath(file_name, matchFunc=os.path.isfile):
for dir_name in sys.path:
candidate = os.path.join(dir_name, file_name)
if matchFunc(candidate):
return candidate
raise Exception("Can't find file: ".format(file_name))

zip_file = get_referenced_filepath("awswrangler-depends.zip")

subprocess.run()

# Can't install --user, or without "-t ." because of permissions issues on the filesystem
subprocess.run(, shell=True)

#Additonal code as part of AWS Thread https://forums.aws.amazon.com/thread.jspa?messageID=954344
sys.path.insert(0, '/glue/lib/installation')
keys =
for k in keys:
if 'boto' in k:
del sys.modules[k]

import boto3
print('boto3 version')
print(boto3.__version__)
  1. Check if the code is working with latest AWS CLI API

Thanks Sarath