1
votes

I am using spark-cassandra-connector to connect to cassandra from spark.

I am able to connect through Livy successfully using the below command.

curl -X POST --data '{"file": "/my/path/test.py", "conf" : {"spark.jars.packages": "com.datastax.spark:spark-cassandra-connector_2.11:2.3.0", "spark.cassandra.connection.host":"myip"}}' -H "Content-Type: application/json" localhost:8998/batches

Also able to connect through pyspark shell interactively using below command

sudo pyspark --packages com.datastax.spark:spark-cassandra-connector_2.10:2.0.10 --conf spark.cassandra.connection.host=myip

However not able to connect through spark-submit. some of the commands I have tried for the same are below.

spark-submit test.py --packages com.datastax.spark:spark-cassandra-connector_2.11:2.3.2 --conf spark.cassandra.connection.host=myip this one didnt work.

I tried passing these parameters my python files used for spark-submit, still didnt work.

conf = (SparkConf().setAppName("Spark-Cassandracube").set("spark.cassandra.connection.host","myip").set({"spark.jars.packages","com.datastax.spark:spark-cassandra-connector_2.11:2.3.0"))

sc = SparkContext(conf = conf) sqlContext = SQLContext(sc)

tried passing these parameters uisng jupyter notebook was also.

import os

os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.datastax.spark:spark-cassandra-connector_2.11:2.3.0 --conf spark.cassandra.connection.host="myip" pyspark-shell'

All the threads that i have seen so far are talking about spark-cassandra-connector using spark-shell but nothing much about spark-submit.

Version used

Livy : 0.5.0 Spark : 2.4.0 Cassandra : 3.11.4

1
what spark version do you haveAlex Ott

1 Answers

2
votes

Not tested, but the most probable cause is that you're specifying all options:

--packages com.datastax.spark:spark-cassandra-connector_2.11:2.3.2 \
  --conf spark.cassandra.connection.host=myip

after a name of your script: test.py - in this case, spark-submit considers them as parameters for a script itself, not for spark-submit. Try to move script name after options...

P.S. See Spark documentation for more details...