3
votes

I have setup spark and cassandra cluster and am using cassandra connector in my spark jobs. Now to run my jobs I use spark.cassandra.connection.host and pass one of the ip address of the seed node in one data centre. I was going through the connector site and its states like

"The initial contact node given in spark.cassandra.connection.host can be any node of the cluster. The driver will fetch the cluster topology from the contact node and will always try to connect to the closest node in the same data center. If possible, connections are established to the same node the task is running on." 

My query is what happens if the contact node is down. Spark will not be able to getthe cluster topology and hence will not work. I also used nodejs connector for cassandra and there we provide an array of contact points. Is it possible in spark cassandra connector

2

2 Answers

1
votes

Well, according to the connector documentation,

Multiple hosts can be passed in using a comma separated list ("127.0.0.1,127.0.0.2"). These are the initial contact points only, all nodes in the local DC will be used upon connecting.

So feel free to add in there as many contact points you feel comfortable with. As long as at least one is connectable from our client, you're good to go.

0
votes

No, you cannot pass an array of hosts into 'spark.cassandra.connection.host' (although, if you wanted to, you could write a service that would check the connection of all hosts and then conditionally use one as your SparkConfig).

However, from the documentation, it sounds like we could assume that spark-cassandra will choose any WORKING node (meaning as long as you have one node up on the host, it'll work).

The initial contact node given in spark.cassandra.connection.host can be any node of the cluster.

Also, this sounds like if a node is down, the request will retry on Local Nodes (not a different host)

If some nodes in the local data center are down and a read or write operation fails, the operation won't be retried on nodes in a different data center.

Hope this helps.