I get the error
ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.util.NoSuchElementException: None.get
when I run my Job using a Dataproc cluster, when I run it locally it runs perfectly. I have recreated the issue using the following toy example.
package com.deequ_unit_tests
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession
object reduce_by_key_example {def main(args: Array[String]): Unit = {
// Set the log level to only print errors
Logger.getLogger("org").setLevel(Level.ERROR)
val spark: SparkSession = SparkSession.builder()
.master("local[1]")
.appName("SparkByExamples.com")
.getOrCreate()
println("Step 1")
val data = Seq(("Project", 1),
("Gutenberg’s", 1),
("Alice’s", 1),
("Adventures", 1),
("in", 1),
("Wonderland", 1),
("Project", 1),
("Gutenberg’s", 1),
("Adventures", 1),
("in", 1),
("Wonderland", 1),
("Project", 1),
("Gutenberg’s", 1))
println("Step 2")
val rdd = spark.sparkContext.parallelize(data)
println("Step 3")
val rdd2 = rdd.reduceByKey(_ + _)
println("Step 4")
rdd2.foreach(println)
}
}
When I run this job in Dataproc, I get this error when executing the line
rdd2.foreach(println)
As additional information, I have to say that I wasn't receiving this error until some changes where applied in my company's Dataproc cluster. For colleagues using PySpark, with an equivalent version in Pyspark of the example above, changing
sc = SparkContext('local')
to
sc = SparkContext()
did the trick, but I couldn't find an equivalent solution in Spark Scala. Do you have any idea about what could be causing this issue? Any help is welcomed.