3
votes

Datastax Spark Cassandra Connector takes "spark.cassandra.connection.host" for connecting to cassandra cluster.

  1. Can we provide headless service of C* cluster on K8S environment as host to this parameter("spark.cassandra.connection.host").

  2. Will it resolve the contact points?

  3. What is the preferred way of connecting with C* cluster on the K8s environment with Spark Cassandra Connector?

2

2 Answers

2
votes

By default, SCC resolves all provided contact points into IP addresses on the first connect, and then only uses these IP addresses for reconnection. And after initial connection happened, it discover the rest of the cluster. Usually this is not a problem as SCC should receive notifications about nodes up & down and track nodes IP addresses. But in practice, it could happen that nodes are restarted too fast, and notifications are not received, so Spark jobs that use SCC could stuck trying to connect to the IP addresses that aren't valid anymore - I hit this multiple times on the DC/OS.

This problem is solved with the release of SCC 2.5.0 that includes a fix for SPARKC-571. It introduced a new configuration parameter - spark.cassandra.connection.resolveContactPoints that when it's set to false (true by default) will always use hostnames of the contact points for both initial connection & reconnection, avoiding the problems with changed IP addresses.

So on K8S I would try to use this configuration parameter with just normal Cassandra deployment.

1
votes

Yes, why not. There is a good example on the Kubernetes official documentation. You create a headless service with a selector:

apiVersion: v1
kind: Service
metadata:
  labels:
    app: cassandra
  name: cassandra
spec:
  clusterIP: None
  ports:
  - port: 9042
  selector:
    app: cassandra

and basically when you specify spark.cassandra.connection.host=cassandra (in the same K8s namespace, otherwise, you have to provide Cassandra..svc.cluster.local` it will resolve to the Cassandra contact points (the Pod IP addresses where Cassandra is running)

✌️