2
votes

We have a secured Dataproc cluster, we are able to successfully SSH into it with individual user ID's with the command:

gcloud compute ssh cluster-name --tunnel-through-iap

But when we create a profile and attach it to Data Fusion instance and configure the pipeline to run it throws connection timeout:

java.io.IOException: com.jcraft.jsch.JSchException: java.net.ConnectException: Connection timed out (Connection timed out)
    at io.cdap.cdap.common.ssh.DefaultSSHSession.<init>(DefaultSSHSession.java:88) ~[na:na]
    at io.cdap.cdap.internal.app.runtime.distributed.remote.RemoteExecutionTwillPreparer.lambda$start$0(RemoteExecutionTwillPreparer.java:436) ~[na:na] 

How can we configure Data Fusion pipeline to run with a secured Dataproc cluster? Kindly let me know.

1
Hi Phaneendra, just to verify, did you create a new profile wih the Remote Hadoop provisioner? Also is the Cloud Data Fusion instance a public instance or private instance? - Edwin Elia
Datafusion instance was created with a private IP, the issue was with firewall rules , once it was fixed it got working. - phaneendra kumar

1 Answers

0
votes

Some information to give more context on this question:

  • From the option --tunnel-through-iap, most probably you are using Tunneling with SSH and cluster-name is the instance name into the Dataproc cluster you want to connect to. The link also provide information about the option --internal-ip that connect to an instance only through its internal IP.
  • Data Fusion explains the procedure to create private IP addresses to limit the access to your instance.

Hence, a private IP instance and the option --internal-ip could be a good combination to connect to your instance (keeping a secured cluster) once the firewall rules are correctly configured.