Take a look at this question: Scala + Spark - Task not serializable: java.io.NotSerializableExceptionon. When calling function outside closure only on classes not objects.
Problem:
Suppose my mappers can be functions (def) that internally call other classes and create objects and do different things inside. (Or they can even be classes that extend (Foo) => Bar and do the processing in their apply method - but let'ś ignore this case for now)
Spark supports only Java Serialization for closures. Is there ANY way out of this? Can we use something instead of closures to do what I want to do? We can easily do this sort of stuff with Hadoop. This single thing is making Spark almost unusable for me. One cannot expect all 3rd party libraries to have all classes extend Serializable!
Probable Solutions:
Does something like this seem to be of any use: https://github.com/amplab/shark/blob/master/src/main/scala/shark/execution/serialization/KryoSerializationWrapper.scala
It certainly seems like a wrapper is the answer, but I cannot see exactly how.