7
votes

I'm looking for a client jdbc driver that supports Spark SQL.

I have been using Jupyter so far to run SQL statements on Spark (running on HDInsight) and I'd like to be able to connect using JDBC so I can use third-party SQL clients (e.g. SQuirreL, SQL Explorer, etc.) instead of the notebook interface.

I found an ODBC driver from Microsoft but this doesn't help me with java-based SQL clients. I also tried downloading the Hive jdbc driver from my cluster, but the Hive JDBC driver does not appear to support more advance SQL features that Spark does. For example, the Hive driver complains about not supporting join statements that are not equajoins, where I know that this is a supported feature of Spark because I've executed the same SQL in Jupyter successfully.

1
Questions asking for recommendations or help with finding a library or another off-site resources are off topic.Mark Rotteveel
simba.com/drivers/spark-jdbc-odbc Simba’s Apache Spark ODBC and JDBC Drivers efficiently map SQL to Spark SQL by transforming an application’s SQL query into the equivalent form in Spark SQL, enabling direct standard SQL-92 access to Apache Spark distributions. Thekliew
I would try the hive jdbc driver to talk to it.lockwobr
@kliew - Simba driver is expensive, and I was hoping for for something that's part of the platform. Sounds like this is not available today, and although the hive driver ships as part of the stack, there is no spark jdbc driver available in a similar capacity.aaronsteers
@lockwobr - Problem with the hive driver is that it doesn't accept the broader SQL features supported today by Spark. I'm confused why the hive jdbc driver is included as downloadable component on the server, but nothing similar on the spark sql side. Maybe it's just a matter of time?...aaronsteers

1 Answers

1
votes

the Hive JDBC driver does not appear to support more advance SQL features that Spark does

Regardless of the support that it provides, the Spark Thrift Server is fully compatible with Hive/Beeline's JDBC connection.

Therefore, that is the JAR you need to use. I have verified this works in DBVisualizer.

The alternative solution would be to run Spark code in your Java clients (non-third party tools) directly and skip the need for the JDBC connection.