I am trying to map the values from a csv file into an RDD but I get the following error because some of the fields are null.
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.NumberFormatException: empty String
Following is the code I am using.
// Load and parse the data
val data = sc.textFile("data.csv")
val parsedData = data.map(s => Vectors.dense(s.split(',').map(_.toDouble))).cache()
Is there any way to check if there is a null? I thought of doing it with a try catch method but it doesn't seem to work.
val parsedData = data.map(s => {
try {
val vector = Vectors.dense(s.split(',').map(_.toDouble))
}catch{
case e:NumberFormatException => println("Nulls somewhere")
}
(vector)
})
rdd()
on Dataframe object. – shriyog