1
votes

I see some difference when using underscore parameter or named parameter in spark map function.

look at this code (executed in spark-shell):

var ds = Seq(1,2,3).toDS()
ds.map(t => Array("something", "" + t)).collect // works cool
ds.map(Array("funk", "" + _)).collect // doesn't work

the exception I get for the not working line is:

error: Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.

1

1 Answers

5
votes

That's because the expansion of:

ds.map(Array("funk", "" + _)).collect 

Doesn't work as you think. It expands to:

ds.map(Array("funk", ((x: Any) => "" + x))).collect 

The _ in your array creation expands to a function. According to the documentation of DataSets, functions are not supported.

If we take a minimal reproduce:

val l = List(1,2,3)
val res = l.map(Array("42", "" + _))

And see the typer expansion (scalac -Xprint:typer), you can see:

def main(args: Array[String]): Unit = {
  val l: List[Int] = scala.collection.immutable.List.apply[Int](1, 2, 3);
  val res: List[Object] = 
    l.map[Object, List[Object]]
    (scala.Predef.wrapRefArray[Object]
      (scala.Array.apply[Object]("42", ((x$1: Any) => "".+(x$1))

If we isolate the specific relevant part, we can see that:

(x$1: Any) => "".+(x$1)

Is the expansion that happens inside the array creation.