multiple contact points in the spark cassandra connector

Question

I have setup spark and cassandra cluster and am using cassandra connector in my spark jobs. Now to run my jobs I use spark.cassandra.connection.host and pass one of the ip address of the seed node in one data centre. I was going through the connector site and its states like

"The initial contact node given in spark.cassandra.connection.host can be any node of the cluster. The driver will fetch the cluster topology from the contact node and will always try to connect to the closest node in the same data center. If possible, connections are established to the same node the task is running on."

My query is what happens if the contact node is down. Spark will not be able to getthe cluster topology and hence will not work. I also used nodejs connector for cassandra and there we provide an array of contact points. Is it possible in spark cassandra connector

Mihai Caracostea Mihai Caracostea · Accepted Answer · 2016-01-09T23:32:05

Well, according to the connector documentation,

Multiple hosts can be passed in using a comma separated list ("127.0.0.1,127.0.0.2"). These are the initial contact points only, all nodes in the local DC will be used upon connecting.

So feel free to add in there as many contact points you feel comfortable with. As long as at least one is connectable from our client, you're good to go.

multiple contact points in the spark cassandra connector

2 Answers