3
votes

Say I have the following case class, where I define an apply function in its companion object:

case class MyClass(a: Int, b: String) 
object MyClass {
  def apply(a: Int, b: String) = {
    if (b == "")
      new MyClass(a, a.toString)
    else
      new MyClass(a, b)
  }
}

println(MyClass(1,"")) //print MyClass(1,1) 
println(MyClass(2,"3")) // print MyClass(2,3)

But now say I want to have this behavior with my datasets:

val dx: Dataset[MyClass] = Seq((1,"b"))
        .toDF("a", "b")
        .as[MyClass]

dx.show(false)

will print, as expected:

+---+---+
|a  |b  |
+---+---+
|1  |b  |
+---+---+

but now if I do:

val dx: Dataset[MyClass] = Seq((1,""))
        .toDF("a", "b")
        .as[MyClass]

dx.show(false)

I will have:

+---+---+
|a  |b  |
+---+---+
|1  |   |
+---+---+

While I expected:

+---+---+
|a  |b  |
+---+---+
|1  |1  |
+---+---+

Why is that? Can I have somehow the apply function to be "effective"?

Thanks a lot for the explanation

Apparently the as method bypass the apply and calls the constructor directly. I believe you could do this: Seq((1,"b")).toDF("a", "b").as[(Int, String)].map(MyClass.apply.tupled) - Can you confirm if this works? - Luis Miguel Mejía Suárez
This worked: Seq((1,"")).toDF("a", "b").map(r => MyClass.apply(r.getInt(0), r.getString(1))) - fricadelle
If a row-wise map is necessary for enabling apply, perhaps using a when/otherwise transformation to replace the custom apply is worth considering. As a side note, it's baffling why this question warrants downvotes. - Leo C