The DataStax spark cassandra connector is great for interacting with Cassandra through Apache Spark. With Spark SQL 1.1, we can use the thrift server to interact with Spark with Tableau. Since Tableau can talk to Spark, and Spark can talk to Cassandra, there's surely some way to get Tableau talking to Cassandra through Spark (or rather Spark SQL). I can't figure out how to get this running. Ideally, I'd like to do this with Spark Standalone cluster + a cassandra cluster (i.e. without additional hadoop set up). Is this possible? Any pointers are appreciated.
3
votes
Tableau just announced a driver for Spark SQL tableausoftware.com/about/blog/2014/10/…. The article describes how to request a beta copy.
– Alex Blakemore
Any idea on getting spark + tableau to query cassandra?
– ashic
Since Spark SQL can access Cassandra, it ought to be possible using the Tableau Spark SQL driver. Are you using the beta driver? If so what specific problem do you have? (or better yet, tell the beta program so they can fix it)
– Alex Blakemore
The way spark sql and cassandra works is you do sc = new SparkContext(..); cc = new CassandraCqlContext(sc); cc.sql("Select * ...") . When I'm running the thriftserver, how would I tell the thriftserver to do this?
– ashic
I don't have the answer, but if you are using the Tableau beta driver, they give you an email contact for feedback. They are working with Databricks on that driver, so that is a better place to direct your question.
– Alex Blakemore
1 Answers
3
votes
The HiveThriftServer has a HiveThriftServer2.startWithContext(sqlContext)
option so you could create your sqlContext referencing C* and the appropriate table / CF and then pass that context to the thrift server.
So something like this:
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.catalyst.types._
import java.sql.Date
val sparkContext = sc
import sparkContext._
val sqlContext = new HiveContext(sparkContext)
import sqlContext._
makeRDD((1,"hello") :: (2,"world") ::Nil).toSchemaRDD.cache().registerTempTable("t")
import org.apache.spark.sql.hive.thriftserver._
HiveThriftServer2.startWithContext(sqlContext)
So instead of starting the default thriftserver from Spark you could just lunch you cusotm one.