You can do this by defining a function to get the mean, and another function to fill nulls in a row.
Given the DF you presented:
val df = sc.parallelize(List((Some(1),Some(2),Some(3),Some(4),Some(5),Some(6)),(Some(2),Some(2),Some(2),None,Some(4),Some(0)),(Some(3),None,None,None,Some(6),Some(0)),(Some(4),None,None,Some(4),Some(4),Some(0)))).toDF("a","b","c","d","e","f")
We need a function to get the mean of a Row:
import org.apache.spark.sql.Row
def rowMean(row: Row): Int = {
val nonNulls = (0 until row.length).map(i => (!row.isNullAt(i), row.getAs[Int](i))).filter(_._1).map(_._2).toList
nonNulls.sum / nonNulls.length
}
And another to fill nulls in a Row:
def rowFillNulls(row: Row, fill: Int): Row = {
Row((0 until row.length).map(i => if (row.isNullAt(i)) fill else row.getAs[Int](i)) : _*)
}
Now we can first compute each row mean:
val rowWithMean = df.map(row => (row,rowMean(row)))
And then fill it:
val result = sqlContext.createDataFrame(rowWithMean.map{case (row,mean) => rowFillNulls(row,mean)}, df.schema)
Finally view before and after...
df.show
+---+----+----+----+---+---+
| a| b| c| d| e| f|
+---+----+----+----+---+---+
| 1| 2| 3| 4| 5| 6|
| 2| 2| 2|null| 4| 0|
| 3|null|null|null| 6| 0|
| 4|null|null| 4| 4| 0|
+---+----+----+----+---+---+
result.show
+---+---+---+---+---+---+
| a| b| c| d| e| f|
+---+---+---+---+---+---+
| 1| 2| 3| 4| 5| 6|
| 2| 2| 2| 2| 4| 0|
| 3| 3| 3| 3| 6| 0|
| 4| 3| 3| 4| 4| 0|
+---+---+---+---+---+---+
This will work for any width DF with Int columns. You can easily update this to other datatypes, even non-numeric (hint, inspect the df schema!)