3
votes

The DataStax spark cassandra connector is great for interacting with Cassandra through Apache Spark. With Spark SQL 1.1, we can use the thrift server to interact with Spark with Tableau. Since Tableau can talk to Spark, and Spark can talk to Cassandra, there's surely some way to get Tableau talking to Cassandra through Spark (or rather Spark SQL). I can't figure out how to get this running. Ideally, I'd like to do this with Spark Standalone cluster + a cassandra cluster (i.e. without additional hadoop set up). Is this possible? Any pointers are appreciated.

1
Tableau just announced a driver for Spark SQL tableausoftware.com/about/blog/2014/10/…. The article describes how to request a beta copy.Alex Blakemore
Any idea on getting spark + tableau to query cassandra?ashic
Since Spark SQL can access Cassandra, it ought to be possible using the Tableau Spark SQL driver. Are you using the beta driver? If so what specific problem do you have? (or better yet, tell the beta program so they can fix it)Alex Blakemore
The way spark sql and cassandra works is you do sc = new SparkContext(..); cc = new CassandraCqlContext(sc); cc.sql("Select * ...") . When I'm running the thriftserver, how would I tell the thriftserver to do this?ashic
I don't have the answer, but if you are using the Tableau beta driver, they give you an email contact for feedback. They are working with Databricks on that driver, so that is a better place to direct your question.Alex Blakemore

1 Answers

3
votes

The HiveThriftServer has a HiveThriftServer2.startWithContext(sqlContext) option so you could create your sqlContext referencing C* and the appropriate table / CF and then pass that context to the thrift server.

So something like this:

import  org.apache.spark.sql.hive.HiveContext
import  org.apache.spark.sql.catalyst.types._
import  java.sql.Date
val  sparkContext  =  sc
import  sparkContext._
val  sqlContext  =  new  HiveContext(sparkContext)
import  sqlContext._
makeRDD((1,"hello") :: (2,"world") ::Nil).toSchemaRDD.cache().registerTempTable("t")
import  org.apache.spark.sql.hive.thriftserver._
HiveThriftServer2.startWithContext(sqlContext)

So instead of starting the default thriftserver from Spark you could just lunch you cusotm one.