0
votes

I'm using Spark-Cassandra connector for reading data from Cassandra. I have next table in C*:

CREATE TABLE my_table (key uuid PRIMARY KEY, value text);

I want to get list of my_table records by list of their keys. I wrote next code:

sc.cassandraTable("my_keyspace", "my_table")
   .select("value") 
   .where("key in ?", listOfKeys).collect()

In logs I saw next info:

Caused by: java.io.IOException: Exception during preparation of SELECT "values" FROM "my_keyspace"."my_table" WHERE token("key") > ? AND token("key") <= ? AND key in ? ALLOW FILTERING: key cannot be restricted by more than one relation if it includes a IN

I found next reported bug with won't fix status in C* JIRA https://issues.apache.org/jira/browse/CASSANDRA-6151

How can I read data from C* using spark-cassandra connector by primary key?

Cassandra v. 2.1.9 Spark v. 1.6.1

1

1 Answers

1
votes

I think that you can use joinWithCassandraTable method for this purpose.

Something like this:

val keys = sc.parallelize(listOfKeys)
val rowsRDD = keys.map(Tuple1(_))
    .repartitionByCassandraReplica("my_keyspace","my_table")
    .joinWithCassandraTable("my_keyspace","my_table")

You can find more about reading from C* in the connector documentation there.

Keep in mind that using IN in the WHERE clause is usually not recommended as described here.