Use JDBC (eg Squirrel SQL) to query Cassandra with Spark SQL

Question

I have a Cassandra cluster with a co-located Spark cluster, and I can run the usual Spark jobs by compiling them, copying them over, and using the ./spark-submit script. I wrote a small job that accepts SQL as a command-line argument, submits it to Spark as Spark SQL, Spark runs that SQL against Cassandra and writes the output to a csv file.

Now I feel like I'm going round in circles trying to figure out if it's possible to query Cassandra via Spark SQL directly in a JDBC connection (eg from Squirrel SQL). The Spark SQL documentation says

Connect through JDBC or ODBC.

A server mode provides industry standard JDBC and ODBC connectivity for
business intelligence tools.

The Spark SQL Programming Guide says

Spark SQL can also act as a distributed query engine using its JDBC/ODBC or
command-line interface. In this mode, end-users or applications can interact
with Spark SQL directly to run SQL queries, without the need to write any 
code.

So I can run the Thrift Server, and submit SQL to it. But what I can't figure out, is how do I get the Thrift Server to connect to Cassandra? Do I simply pop the Datastax Cassandra Connector on the Thrift Server classpath? How do I tell the Thrift Server the IP and Port of my Cassandra cluster? Has anyone done this already and can give me some pointers?

Kaushal Kaushal · Accepted Answer · 2015-12-10T16:24:48

Configure those properties in spark-default.conf file

spark.cassandra.connection.host    192.168.1.17,192.168.1.19,192.168.1.21
# if you configured security in you cassandra cluster
spark.cassandra.auth.username   smb
spark.cassandra.auth.password   bigdata@123

Start your thrift server with spark-cassandra-connector dependencies and mysql-connector dependencies with some port that you will connect via JDBC or Squirrel.

sbin/start-thriftserver.sh --hiveconf hive.server2.thrift.bind.host 192.168.1.17 --hiveconf hive.server2.thrift.port 10003 --jars <shade-jar>-0.0.1.jar --driver-class-path <shade-jar>-0.0.1.jar

For getting cassandra table run Spark-SQL queries like

CREATE TEMPORARY TABLE mytable USING org.apache.spark.sql.cassandra OPTIONS (cluster 'BDI Cassandra', keyspace 'testks', table 'testtable');

Use JDBC (eg Squirrel SQL) to query Cassandra with Spark SQL

2 Answers