1
votes

I saw the github repo of spark-cassandra-connector and i did not found a ReaderBuilder implemented their but a WriterBuilder was implemented and can anyone help me with that as i want to read data from cassandra DB using a CassandraConnector reference.

I wanted to connect two cassandra clusters in the same SparkContext and i want to read data from both of them and so i needed a ReaderBuilder for reading data from my second cassandra cluster also I am working with java language here.

Github repo Link: https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/java/com/datastax/spark/connector/japi/RDDAndDStreamCommonJavaFunctions.java

CassandraConnector eventsConnector = CassandraConnector.apply(sc.getConf().set("spark.cassandra.connection.host", "192.168.36.234"));
1
github.com/datastax/spark-cassandra-connector/blob/master/doc/….... not answer to your question.. but way to connect to multiple cluster from same sparkContextundefined_variable
I tried to implement this approach but as you can see it is for reading from one cluster and writing to another, and using this approach it is not possible to read from second cluster. so still the problem remains the same.Yash Tandon

1 Answers

1
votes

My first suggestion would be to not use RDDs in Java. RDD's in Java is much more difficult than in Scala and it's also the old api. I would suggest using DataFrames instead. These provide a much cleaner interface between different datasources as well as automatic optimizations and other benefits.

Now if you cannot use DataFrames, you would instead just make the CassandraJavaRDD and then use "withConnector" or "withReadConf" to change the read configuration.

https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/java/com/datastax/spark/connector/japi/rdd/CassandraJavaRDD.java#L123-L129

Something like

val cluster2 = CassandraConnector eventsConnector = 
  CassandraConnector.apply(
    sc.getConf()
      .set("spark.cassandra.connection.host", "192.168.36.234"));

  javaFunctions(sc).cassandraTable(ks, "test_table").withConnector(cluster2).collect()
}

There is no need for a builder because the RDD itself has a fluent API. Since writing happens immediately on the conclusion of the call it needed a builder.