0
votes

I am trying to write a dataframe to cassandra using pyspark but its thworing me an error:

py4j.protocol.Py4JJavaError: An error occurred while calling o74.save. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in stage 3.0 failed 4 times, most recent failure: Lost task 6.3 in stage 3.0 (TID 24, ip-172-31-11-193.us-west-2.compute.internal, executor 1): java.lang.NoClassDefFoundError: com/twitter/jsr166e/LongAdder at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsSupport$class.$init$(OutputMetricsUpdater.scala:107) at org.apache.spark.metrics.OutputMetricsUpdater$TaskMetricsUpdater.(OutputMetricsUpdater.scala:153) at org.apache.spark.metrics.OutputMetricsUpdater$.apply(OutputMetricsUpdater.scala:75) at com.datastax.spark.connector.writer.TableWriter.writeInternal(TableWriter.scala:209) at com.datastax.spark.connector.writer.TableWriter.insert(TableWriter.scala:197) at com.datastax.spark.connector.writer.TableWriter.write(TableWriter.scala:183) at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36) at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Below is my code for write:

DataFrame.write.format(
   "org.apache.spark.sql.cassandra"
).mode(
   'append'
).options(
   table="student1", 
   keyspace="university"
).save()

I have added the below mentioned spark-caasandra connector in spark-default.conf

spark.jars.packages datastax:spark-cassandra-connector:2.4.0-s_2.11

I am able to read the data from cassandra but issue is with write.

1

1 Answers

0
votes

I am not an expert of Spark, but this might help:

These errors are commonly thrown when the Spark Cassandra Connector or its dependencies are not on the runtime classpath of the Spark Application. This is usually caused by not using the prescribed --packages method of adding the Spark Cassandra Connector and its dependencies to the runtime classpath.

Source: https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md#why-cant-the-spark-job-find-spark-cassandra-connector-classes-classnotfound-exceptions-for-scc-classes