0
votes

I have 2 spark jobs which connect to Cassandra using the spark-cassandra connector. https://github.com/datastax/spark-cassandra-connector

First job uses Kafka to stream data in Spark and process in real time. After processing each message, it saves the message to Cassandra.

Second job is a batch job which is deployed every 10 seconds to read data from cassandra.

So, one streaming spark job is writing the data to a Cassandra keyspace and other batch job is deployed again and again to read data from the SAME keyspace. My question is:

Can you open 2 sessions from 2 spark jobs to read/write the same keyspace ?

Note: I am also using the same username/password to connect to cassandra from both spark jobs.

1
Do you want the Batch job to be cumulative or just the last 10 Seconds?RussS
I want it to be Commulative.Behroz Sikander

1 Answers

0
votes

I found the solution. The problem had nothing to do with Cassandra. My Spark cluster had very limited resources. All of the resources were taken by my streaming job. When I deployed my batch job, there were no resources to allocate and my job was in waiting state. Once the other job was done, my batch job was able to run.

I changed the configuration of both of my spark jobs to use only 1 core and 1 gb of RAM for driver/executor. Now, both of my jobs run in parallel without any issue. Both jobs are using the same username/password to connect to Cassandra and one job is writing to cassandra while the other job is reading from the same keyspace.

Replication factor for my keyspace is 1.

Since, my batch job was hanging, I thought that the problem must be with Cassandra because I am reading/writing to same keyspace. This was my first time interacting with cassandra so ....