0
votes

I am trying to use R on AWS to connect to our cluster running Cloudera hadoop. Following the steps mentioned here - http://blog.cloudera.com/blog/2013/12/how-to-do-statistical-analysis-with-impala-and-r/

So far, I could initiate the jdbc driver but not able to connect to impala.

enter image description here .

From some investigation, I can see that the impala daemon is running in all our worker nodes. And the ports are configured like this.

enter image description here

Also, I logged in to one of the worker node and checked the ports which are listening. I can see port 21050 listening, Here it is,

enter image description here

Here in rimpala connect, I am using public IP of the worker node. Still not able to connect to that. I can use the public IP and port 25000 to see impala web UI, but cannot connect to this port listening jdbc requests. Can anyone help me in this?

1

1 Answers

0
votes

In case anyone is looking for help, here is the answer I got from Cloudera support.

"The problem is not with the Impala or Cloudera distro. The problem is with the driver being used by “Rimpala”. RImapla is using HIVE JDBC driver. If you check the source code at https://github.com/Mu-Sigma/RImpala/blob/master/java/src/main/java/com/musigma/ird/bigdata/RImpala.java you will see that the calls being used as the drive is “org.apache.hive.jdbc.HiveDriver” . So ideally RImpala package is outdated and it is not updated to work."