I have a rather peculiar problem. In a DSE spark analytics engine I produce frequent stats that I store to cassandra in a small table. Since I keep the table trimmed and it is supposed to serve a web interface with consolidated information, I simply want to query the whole table in spark and send the results over an API. I have tried two methods for this:
val a = Try(sc.cassandraTable[Data](keyspace, table).collect()).toOption
val query = "SELECT * FROM keyspace.table"
val df = spark.sqlContext.sql(query) val list = df.collect()
I am doing this in a scala program. When I use method 1, spark job mysteriously gets stuck showing stage 10 of 12 forever. Verified in logs and spark jobs page. When I use the second method it simply tells me that no such table exists:
Unknown exception: org.apache.spark.sql.AnalysisException: Table or view not found: keyspace1
.table1
; line 1 pos 15;
'Project [*]
+- 'UnresolvedRelation keyspace1
.table1
Interestingly, I tested both methods in spark shell on the cluster and they work just fine. My program has plenty of other queries done using method 1 and they all work fine, the key difference being that in each of them the main partition key always has a condition on it unlike in this query (holds true for this particular table too).
Here is the table structure:
CREATE TABLE keyspace1.table1 (
userid text,
stat_type text,
event_time bigint,
stat_value double,
PRIMARY KEY (userid, stat_type))
WITH CLUSTERING ORDER BY (stat_type ASC)
Any solid diagnosis of the problem or a work around would be much appreciated