0
votes

I have a spark dataframe and need to do a count of null/empty values for each column. I need to show ALL columns in the output. I have looked online and found a few "similar questions" but the solutions totally blew my mind which is why I am posting here for personal help.

Here is what I have for code, I know this part of the puzzle.

from pyspark.sql import *

sf.isnull()

After running it, this is the error I receive AttributeError: 'DataFrame' object has no attribute 'isnull'

What's interesting is that, I did the same exercise with pandas and used df.isna().sum() which worked great. What am I missing for pyspark?

1
Are you sure that a data frame (in pyspark.syl, not pandas) has such a method: From the documentationTimus
this is where I am confused, I dont know. I clicked on your link and see pyspark.sql.Column.isNull Then I went further and as an example its show filter is being used. I have no clue what that even is.wally
But a Column isn't a DataFrame: "Column: A column expression in a DataFrame"?Timus
there is an answer here alreadySiddhant Tandon

1 Answers

0
votes

you can do the following, just make sure your df is a Spark DataFrame.

from pyspark.sql.functions import col, when

df.select(*(count(when(col(c).isNull(), c)).alias(c) for c in df.columns)).show()