I want to fill nan values in spark conditionally (to make sure I considered each corner case of my data and not by simply filling anything with a replacement value).
A sample could look like
case class FooBar(foo:String, bar:String)
val myDf = Seq(("a","first"),("b","second"),("c",null), ("third","fooBar"), ("someMore","null"))
.toDF("foo","bar")
.as[FooBar]
+--------+------+
| foo| bar|
+--------+------+
| a| first|
| b|second|
| c| null|
| third|fooBar|
|someMore| null|
+--------+------+
Unfortunately
myDf
.withColumn(
"bar",
when(
(($"foo" === "c") and ($"bar" isNull)) , "someReplacement"
)
).show
resets all the regular other values in the column
+--------+---------------+
| foo| bar|
+--------+---------------+
| a| null|
| b| null|
| c|someReplacement|
| third| null|
|someMore| null|
+--------+---------------+
and
myDf
.withColumn(
"bar",
when(
(($"foo" === "c") and ($"bar" isNull)) or
(($"foo" === "someMore") and ($"bar" isNull)), "someReplacement"
)
).show
Which I really would want to use to fill in the values for different classes / categories of foo. does not work as well.
I am curious how to fix this.