Best way to select rows and respective columns below a certain value in pandas

Question

I have a large dataframe with 30+ columns.

The first two columns ("A" and "B") contain general infromation about a feature, while the rest of columns represent different experiments for that feature.

I want to slice my dataframe to only contain rows with respective columns, in which any value was below a threshold, and furthermore include columns A and B.

My current code looks like this :

df=df.loc[df.le(0.05).any(1),].reindex(colums=["A","B"]+list(df.columns[df.le(0.05).any(0)]))

While it works, this looks very inelegant. Is there a "nicer" way to go about?

edit: Example

A	B	C	D	E
1	10	0.9	0.8	0.04
1	20	0.7	0.6	0.5
2	5	0.1	0.01	0.3

and as result

A	B	D	E
1	10	0.8	0.04
2	5	0.01	0.3

mozway mozway · Accepted Answer · 2021-08-11T16:20:50

Let's start with this input:

import numpy as np
np.random.seed(1)
df = pd.concat([pd.DataFrame({'A': ['feature']*5,
                              'B': ['feature']*5,}),
                pd.DataFrame({'col%s' % i: np.random.randint(0,5,5) for i in range(10)})
               ], axis=1)

input:

         A        B  col0  col1  col2  col3  col4
0  feature  feature     3     0     1     3     4
1  feature  feature     4     0     2     4     1
2  feature  feature     0     1     4     2     1
3  feature  feature     1     4     2     4     0
4  feature  feature     3     4     4     2     1

Now we select the rows where there is not at least one 0 (just an example, you can use any selector here):

df[df.drop(['A', 'B'], axis=1).le(0).any(axis=1)]

output:

         A        B  col0  col1  col2  col3  col4
0  feature  feature     3     0     1     3     4
1  feature  feature     4     0     2     4     1
2  feature  feature     0     1     4     2     1
3  feature  feature     1     4     2     4     0

Best way to select rows and respective columns below a certain value in pandas

2 Answers