Can Spark Cassandra Connector resolve hostnanmes from headless service in K8S environment?

Question

Datastax Spark Cassandra Connector takes "spark.cassandra.connection.host" for connecting to cassandra cluster.

Can we provide headless service of C* cluster on K8S environment as host to this parameter("spark.cassandra.connection.host").
Will it resolve the contact points?
What is the preferred way of connecting with C* cluster on the K8s environment with Spark Cassandra Connector?

Alex Ott Alex Ott · Accepted Answer · 2020-08-31T05:47:46

By default, SCC resolves all provided contact points into IP addresses on the first connect, and then only uses these IP addresses for reconnection. And after initial connection happened, it discover the rest of the cluster. Usually this is not a problem as SCC should receive notifications about nodes up & down and track nodes IP addresses. But in practice, it could happen that nodes are restarted too fast, and notifications are not received, so Spark jobs that use SCC could stuck trying to connect to the IP addresses that aren't valid anymore - I hit this multiple times on the DC/OS.

This problem is solved with the release of SCC 2.5.0 that includes a fix for SPARKC-571. It introduced a new configuration parameter - spark.cassandra.connection.resolveContactPoints that when it's set to false (true by default) will always use hostnames of the contact points for both initial connection & reconnection, avoiding the problems with changed IP addresses.

So on K8S I would try to use this configuration parameter with just normal Cassandra deployment.

Can Spark Cassandra Connector resolve hostnanmes from headless service in K8S environment?

2 Answers