Spark fillNa not replacing the null value

Question

I have the following dataset and its contain some null values, need to replace the null value using fillna in spark.

DataFrame:

df = spark.read.format("com.databricks.spark.csv").option("header‌","true").load("/sam‌ple.csv")

>>> df.printSchema();
root
 |-- Age: string (nullable = true)
 |-- Height: string (nullable = true)
 |-- Name: string (nullable = true)

>>> df.show()
+---+------+-----+
|Age|Height| Name|
+---+------+-----+
| 10|    80|Alice|
|  5|  null|  Bob|
| 50|  null|  Tom|
| 50|  null| null|
+---+------+-----+

>>> df.na.fill(10).show()

when i'll give the na values it dosen't changed the same dataframe appeared again.

+---+------+-----+
|Age|Height| Name|
+---+------+-----+
| 10|    80|Alice|
|  5|  null|  Bob|
| 50|  null|  Tom|
| 50|  null| null|
+---+------+-----+

tried create a new dataframe and store the fill values in dataframe but the result showing like unchanged.

>>> df2 = df.na.fill(10)

how to replace the null values? please give me the possible ways by using fill na. Thanks in Advance.

Is there any rules for replacement ? e.g Is replacing nulls in the Height column different than the Name column ? — eliasah
In my case the null value not replaced, if the rule applied or else not specified the rule. the basic fill operation not working properly. checked with the different datasets. — Churchill vins

Mariusz Mariusz · Accepted Answer · 2016-11-03T08:39:22

It seems that your Height column is not numeric. When you call df.na.fill(10) spark replaces only nulls with column that match type of 10, which are numeric columns.

If Height column need to be string, you can try df.na.fill('10').show(), otherwise casting to IntegerType() is neccessary.

Spark fillNa not replacing the null value

2 Answers