collect and take is not working with RDD

Question

I am trying to run this code:

   rddCollected=rddCollect.mapValues(lambda x: (x,1))
   rddCollected.collect() 
   rddCollectJoin=rddCollected.reduceByKey(lambda  x,y:(x[0]+y[0],x[1]+y[1]))

--rddCollected is running fine with collect but rddCollectJoin is not working and giving below error .

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 58.0 failed 1 times, most recent failure: Lost task 0.0 in stage 58.0 (TID 78, localhost, executor driver): java.io.FileNotFoundException: C:\Users\lenovo\AppData\Local\Temp\blockmgr-431169ff-717a-4728-b9b2-c2ed1b4b5b20\0c\temp_shuffle_d089dc45-014d-4d07-b0c0-ee917ad1b501 (The system cannot find the path specified)

Java version is 1.8- I had java 10 but I downsized it to 8 as there were issues with 10 can anyone help?

I closed and re-ran it. It worked..What could be the possible issue? — Monika Samant

Prakhar Gurawa Prakhar Gurawa · Accepted Answer · 2020-11-06T15:26:10

I was facing some issue with .collect(). I was working with a class that was not serializable. Just implement Serializable to that class and try again. It worked for me.

collect and take is not working with RDD

1 Answers