2
votes

I'm new to Python. I want to find out which column in my dataframe has maximum missing values. let's say we have 5 rows 1000 columns.
For example

C1    C2    ...   C1000  
10    21    ...   NaN  
NaN   45    ...   29  
15    21    ...   NaN  
21    NaN   ...   27  
61    NaN   ...   NaN 

C1000 has maximum missing values. So my code should return column name "C1000"

1

1 Answers

5
votes

You could use df.count().idxmin(). df.count() returns Series with number of non-NA/null observations. And, idxmin would give you column with most non-NA/null values.

In [12]: df
Out[12]:
     C1    C2  C1000
0  10.0  21.0    NaN
1   NaN  45.0   29.0
2  15.0  21.0    NaN
3  21.0   NaN   27.0
4  61.0   NaN    NaN

In [13]: df.count()
Out[13]:
C1       4
C2       3
C1000    2
dtype: int64

In [14]: df.count().idxmin()
Out[14]: 'C1000'