
I am using glue console not dev endpoint. The glue job is able to access glue catalogue and table using below code

datasource0 = glueContext.create_dynamic_frame.from_catalog(database = 
"glue-db", table_name = "countries")
print "Table Schema:", datasource0.schema()
print "datasource0", datasource0.show() 

Now I want to get the metadata for all tables from the glue data base glue-db. I could not find a function in awsglue.context api, therefore i am using boto3.

client = boto3.client('glue', 'eu-central-1')
responseGetDatabases = client.get_databases()
databaseList = responseGetDatabases['DatabaseList']
for databaseDict in databaseList:
    databaseName = databaseDict['Name']
    print ("databaseName:{}".format(databaseName))
    responseGetTables = client.get_tables( DatabaseName = databaseName, 
    tableList = responseGetTables['TableList']
    print("response Object{0}".format(responseGetTables))
    for tableDict in tableList:
        tableName = tableDict['Name']
        print("-- tableName:{}".format(tableName))

the code runs in lambda function, but fails within glue etl job with following error

botocore.vendored.requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='glue.eu-central-1.amazonaws.com', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(, 'Connection to glue.eu-central-1.amazonaws.com timed out. (connect timeout=60)'))

The problem seems to be in environment configuration. Glue VPC has two subnets private subnet: with s3 endpoint for glue, allows inbound traffic from the RDS security group. It has public subnet: in glue vpc with nat gateway. Private subnet is reachable through gate nat Gateway. I am not sure what i am missing here.

Can you verify if 443 port is open to internet as it requires other services for it to work and also check try passing the region along with client = boto3.client('glue')Prabhakar Reddy
yes the port 443 is open and i have added the region, still times out after 15 minutes and the job fails. the security group of the glue vpc looks like this. i have allowed almost all traffic for testing purpose but still cannot connect glue using boto3 All TCP TCP 0 - 65535 All TCP TCP 0 - 65535 self reference PostgreSQL TCP 5432 Sg of the peered VPC All traffic All All Self referencing group All traffic All All Sg of the peered VPCUraish
Hi @Uraish did you find a solution for this? I'm facing the same problem and would very much appreciate some help. Thanks.crojassoto
Same issue here, @Uraish if you found a solution, please update. Thanks!Ryan Fisher

3 Answers


Try using a proxy while creating the boto3 client:

from pyhocon import ConfigFactory
service_name = 'glue'

default = ConfigFactory.parse_file('glue-default.conf')
override = ConfigFactory.parse_file('glue-override.conf')

host = override.get('proxy.host', default.get('proxy.host'))
port = override.get('proxy.port', default.get('proxy.port'))

config = Config()

if host and port:
    config.proxies = {'https': '{}:{}'.format(host, port)}

client = boto3.Session(region_name=region).client(service_name=service_name, config=config)

glue-default.conf and glue-override.conf are deployed to the cluster by glue while spark submit into the /tmp directory.

I had a similar issue and I did the same by using the public library from glue: s3://aws-glue-assets-eu-central-1/scripts/lib/utils.py


can you please try the boto client creation as below by specifying the region explicitly?

client = boto3.client('glue',region_name='eu-central-1')

I had a similar problem when I was running this command from Glue Python Shell.

So I created endpoint (VPC->Endpoints) for Glue service (service name: "com.amazonaws.eu-west-1.glue"), this one was assigned to the same Subnet and Security Group as the Glue Connection which was used in the Glue Python Shell Job.