I have following three classes and I am getting
Task not serialized
errors. Full stacktrace see below.
First class is a serialized Person:
public class Person implements Serializable
{
private String name;
private int age;
public String getName()
{
return name;
}
public void setAge(int age)
{
this.age = age;
}
}
This class reads from the text file and maps to the person class:
public class sqlTestv2 implements Serializable
{
private int appID;
private int dataID;
private JavaSparkContext sc;
public JavaRDD<Person> getDataRDD()
{
JavaRDD<String> test = sc.textFile("hdfs://localhost:8020/user/cloudera/people.txt");
JavaRDD<Person> people = test.map(
new Function<String, Person>() {
public Person call(String line) throws Exception {
String[] parts = line.split(",");
Person person = new Person();
person.setName(parts[0]);
person.setAge(Integer.parseInt(parts[1].trim()));
return person;
}
});
return people;
}
}
And this retrieves the RDD and performs operations on it:
public class sqlTestv1 implements Serializable
{
public static void main(String[] arg) throws Exception
{
SparkConf conf = new SparkConf().setMaster("local").setAppName("wordCount");
JavaSparkContext sc = new JavaSparkContext(conf);
SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);
sqlTestv2 v2=new sqlTestv2(1,1,sc);
JavaRDD<Person> test=v2.getDataRDD();
DataFrame schemaPeople = sqlContext.createDataFrame(test, Person.class);
schemaPeople.registerTempTable("people");
DataFrame df = sqlContext.sql("SELECT age FROM people");
df.show();
}
}
Full error:
Exception in thread "main" org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:315) at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:305) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:132) at org.apache.spark.SparkContext.clean(SparkContext.scala:1893) at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:294) at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:293) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) at org.apache.spark.rdd.RDD.map(RDD.scala:293) at org.apache.spark.api.java.JavaRDDLike$class.map(JavaRDDLike.scala:90) at org.apache.spark.api.java.AbstractJavaRDDLike.map(JavaRDDLike.scala:47) at com.oreilly.learningsparkexamples.mini.java.sqlTestv2.getDataRDD(sqlTestv2.java:54) at com.oreilly.learningsparkexamples.mini.java.sqlTestv1.main(sqlTestv1.java:41) Caused by: java.io.NotSerializableException: org.apache.spark.api.java.JavaSparkContext Serialization stack: - object not serializable (class: org.apache.spark.api.java.JavaSparkContext, value: org.apache.spark.api.java.JavaSparkContext@3c3b144b) - field (class: com.oreilly.learningsparkexamples.mini.java.sqlTestv2, name: sc, type: class org.apache.spark.api.java.JavaSparkContext) - object (class com.oreilly.learningsparkexamples.mini.java.sqlTestv2, com.oreilly.learningsparkexamples.mini.java.sqlTestv2@3752fdda) - field (class: com.oreilly.learningsparkexamples.mini.java.sqlTestv2$1, name: this$0, type: class com.oreilly.learningsparkexamples.mini.java.sqlTestv2) - object (class com.oreilly.learningsparkexamples.mini.java.sqlTestv2$1, com.oreilly.learningsparkexamples.mini.java.sqlTestv2$1@70c171ec) - field (class: org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, name: fun$1, type: interface org.apache.spark.api.java.function.Function) - object (class org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, ) at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81)