import numpy as np
data = [
(1, 1, None),
(1, 2, float(5)),
(1, 3, np.nan),
(1, 4, None),
(1, 5, float(10)),
(1, 6, float("nan")),
(1, 6, float("nan")),
]
df = spark.createDataFrame(data, ("session", "timestamp1", "id2"))
Expected output
dataframe with count of nan/null for each column
Note: The previous questions I found in stack overflow only checks for null & not nan. That's why I have created a new question.
I know I can use isnull()
function in Spark to find number of Null values in Spark column but how to find Nan values in Spark dataframe?