Cannot get AWS Data Pipeline connected to Redshift

Question

I have a query I'd like to run regularly in Redshift. I've set up an AWS Data Pipeline for it.

My problem is that I cannot figure out how to access Redshift. I keep getting "Unable to establish connection" errors. I have an Ec2Resource and I've tried including a subnet from our cluster's VPC and using the Security Group Id that Redshift uses, while also adding that sg-id to the inbound part of the rules. No luck.

Does anyone have a from-scratch way to set up a data pipeline to run against Redshift?

How I currently have my pipeline set up

RedshiftDatabase
- Connection String: jdbc:redshift://[host]:[port]/[database]
- Username, Password
Ec2Resource
- Resource Role: DataPipelineDefaultResourceRole
- Role: DataPipelineDefaultRole
- Terminate after: 20 minutes
SqlActivity
- Database: [database] (from Connection String)
- Runs on: Ec2Resource
- Script: SQL query

Error message

Unable to establish connection to jdbc:postgresql://[host]:[port]/[database] Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.

you need to give more details about how you are running the query. it is likely that the ip address of the resource that is running inside data pipeline needs to be added to redshift security group - i think you know that already. — Jon Scott
@JonScott what sort of details? Yes, the VPC is set up to only allow connections from a whitelist of IPs or security groups. So how do I find/set a security group for the Data Pipeline? — ScottieB
details about how you are using data pipeline to access Redshift. what data pipeline resource type? EC2Resource or SQLActivity? — Jon Scott
@JonScott I took a stab at describing it, added as en edit above. I've seen stuff online that says you do not need an Ec2Resource, but if I just have my SqlActivity I get blocking errors that say SqlActivity needs either workerGroup or runsOn. — ScottieB

ScottieB ScottieB · Accepted Answer · 2019-06-24T20:14:13

Ok, so the answer lies in Security Groups. I had to find the Security Group my Redshift cluster is in, and then add that as a value to "Security Group" parameter on the Ec2Resource in the DataPipeline.

Ec2Resource
- Resource Role: DataPipelineDefaultResourceRole
- Role: DataPipelineDefaultRole
- Terminate after: 20 minutes
- Security Group: sg-XXXXX [pull from Redshift]

Cannot get AWS Data Pipeline connected to Redshift

2 Answers