Have a dataset imputedcsv where I want to randomly replace null values in Gender column with say Male or Female..
imputedcsv.groupBy("Gender").count.show()
+------+-----+
|Gender|count|
+------+-----+
| null| 24|
|Female| 240|
| Male| 242|
+------+-----+
One fill the null values by a single value, but how to fill the null value of the column randomly from a set of values say {Male,Female}
imputedcsv.na.fill("Male", Seq("Gender")).groupBy("Gender").count.show()
+------+-----+
|Gender|count|
+------+-----+
|Female| 240|
| Male| 266|
+------+-----+
Instead of replacing the null values by just one value Male, I need to randomly fill it with either Male or Female.
Something like using sample(c('Male','Female'))
For single value we have How to replace null values with a specific value in Dataframe using spark in Java?
Any help is appreciated.