76
votes

Here is how I encountered the error:

df.loc[a_list][df.a_col.isnull()]

The type of a_list is Int64Index, it contains a list of row indexes. All of these row indexes belong to df.

The df.a_col.isnull() part is a condition I need for filtering.

If I execute the following commands individually, I do not get any warnings:

df.loc[a_list]
df[df.a_col.isnull()]

But if I put them together df.loc[a_list][df.a_col.isnull()], I get the warning message (but I can see the result):

Boolean Series key will be reindexed to match DataFrame index

What is the meaning of this error message? Does it affect the result that it returned?

1
Do you still get it when you do this? df.loc[a_list.tolist()]Mohammad Yusuf
@MYGz I updated the question sorry for the mistakeCheng
What are you trying to achieve? df.loc[a_list] may not have the same length as df.a_col.isnull() any more which is the reason you are getting the error.Psidom
@Psidom I want to apply two conditions to the df: 1. pick out the rows from a_list and 2. based one, find the rows with a_col = nullCheng

1 Answers

89
votes

Your approach will work despite the warning, but it's best not to rely on implicit, unclear behavior.

Solution 1, make the selection of indices in a_list a boolean mask:

df[df.index.isin(a_list) & df.a_col.isnull()]

Solution 2, do it in two steps:

df2 = df.loc[a_list]
df2[df2.a_col.isnull()]

Solution 3, if you want a one-liner, use a trick found here:

df.loc[a_list].query('a_col != a_col')

The warning comes from the fact that the boolean vector df.a_col.isnull() is the length of df, while df.loc[a_list] is of the length of a_list, i.e. shorter. Therefore, some indices in df.a_col.isnull() are not in df.loc[a_list].

What pandas does is reindex the boolean series on the index of the calling dataframe. In effect, it gets from df.a_col.isnull() the values corresponding to the indices in a_list. This works, but the behavior is implicit, and could easily change in the future, so that's what the warning is about.