Connecting to Spark SQL on EMR using JDBC

Question

I have spark running on EMR and i have been trying to connect to spark-SQL from SQLWorkbench using the JDBC hive drivers, but in vain. I have started the thrift server on the EMR and i'm able to connect to Hive on port 10000(default) from Tableau/SQL Workbench. When i try to run a query, it fires a Tez/Hive job. However, i want to run the query using Spark. Within the EMR box, I'm able to connect to SparkSQL using beeline and run a query as a spark job. Resource manager shows that the beeline query is running as a spark job, while the query running through SQLWorkbench, is running a hive/Tez job.

When i checked the logs, i found that the thrift server to connect to spark was running on port 10001(default). When i fire up beeline, the entries come up for connection and sql that i'm running. However, when the same connection parameters are used to connect form SQLWorkbench/Tableau, it has an exception without much details. the exception just say connection ended.

I tried running on a custom port by passing the parameters, beeline works, but not through jdbc connection.

Any help to resolve this issue?

Murali Murali · Accepted Answer · 2016-11-22T21:24:21

I was able to resolve the issue. I was able to connect to SparkSQL from Tableau and the reason i was not able to connect was we were bringing up the thrift service as root. Not sure why it would matter, i had to change the permission on the log folder to the current user(not root) and bring up the thrift service, which enabled me to connect without any issues.

Connecting to Spark SQL on EMR using JDBC

1 Answers