If I have a service that needs to have IP whitelisting, how can I connect AWS Glue to it? I read that I seem to be able to put AWS Glue in a private VPC and configure a NAT gateway. I can then allow my NAT IP to connect to the service. However, I cannot find anyway to configure my Glue Job to run inside a subnet/VPC. How do I do this?
2 Answers
The job will run automatically in a VPC if you attach a Database connection to a resource which is inside the VPC. For example, I have a job that reads data from S3 and writes into an Aurora database in a private VPC using a Glue connection (configured as JDBC).
That job automatically has access to all the resources inside the VPC, as explained here. If the VPC has enabled NAT for external access, then your job can also take advantage of that.
Note if you use a connection that requires VPC and you use S3, you will need to enable an endpoint for S3 in that VPC as well.
The answer for your question is answered here -- https://stackoverflow.com/a/64414639 Note that Glue is a 'managed' service so it does not release any list IP addresses such that can be whitelisted. As a workaround you can use a EC2 instance to run your custom python OR pyspark script and whitelist the IP address of that particular EC2 instance