I am using glue console not dev endpoint. The glue job is able to access glue catalogue and table using below code
datasource0 = glueContext.create_dynamic_frame.from_catalog(database =
"glue-db", table_name = "countries")
print "Table Schema:", datasource0.schema()
print "datasource0", datasource0.show()
Now I want to get the metadata for all tables from the glue data base glue-db. I could not find a function in awsglue.context api, therefore i am using boto3.
client = boto3.client('glue', 'eu-central-1')
responseGetDatabases = client.get_databases()
databaseList = responseGetDatabases['DatabaseList']
for databaseDict in databaseList:
databaseName = databaseDict['Name']
print ("databaseName:{}".format(databaseName))
responseGetTables = client.get_tables( DatabaseName = databaseName,
MaxResults=123)
print("responseGetDatabases{}".format(responseGetTables))
tableList = responseGetTables['TableList']
print("response Object{0}".format(responseGetTables))
for tableDict in tableList:
tableName = tableDict['Name']
print("-- tableName:{}".format(tableName))
the code runs in lambda function, but fails within glue etl job with following error
botocore.vendored.requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='glue.eu-central-1.amazonaws.com', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(, 'Connection to glue.eu-central-1.amazonaws.com timed out. (connect timeout=60)'))
The problem seems to be in environment configuration. Glue VPC has two subnets private subnet: with s3 endpoint for glue, allows inbound traffic from the RDS security group. It has public subnet: in glue vpc with nat gateway. Private subnet is reachable through gate nat Gateway. I am not sure what i am missing here.