I will expand on @User's generic solution to provide a drop
free alternative. This is for folks directed here based on the question's title (not OP 's problem)
Say you want to delete all rows with negative values. One liner solution is:-
df = df[(df > 0).all(axis=1)]
Step by step Explanation:--
Let's generate a 5x5 random normal distribution data frame
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,5), columns=list('ABCDE'))
A B C D E
0 1.764052 0.400157 0.978738 2.240893 1.867558
1 -0.977278 0.950088 -0.151357 -0.103219 0.410599
2 0.144044 1.454274 0.761038 0.121675 0.443863
3 0.333674 1.494079 -0.205158 0.313068 -0.854096
4 -2.552990 0.653619 0.864436 -0.742165 2.269755
Let the condition be deleting negatives. A boolean df satisfying the condition:-
df > 0
A B C D E
0 True True True True True
1 False True False False True
2 True True True True True
3 True True False True False
4 False True True False True
A boolean series for all rows satisfying the condition Note if any element in the row fails the condition the row is marked false
(df > 0).all(axis=1)
0 True
1 False
2 True
3 False
4 False
dtype: bool
Finally filter out rows from data frame based on the condition
df[(df > 0).all(axis=1)]
A B C D E
0 1.764052 0.400157 0.978738 2.240893 1.867558
2 0.144044 1.454274 0.761038 0.121675 0.443863
You can assign it back to df to actually delete vs filter ing done above
df = df[(df > 0).all(axis=1)]
This can easily be extended to filter out rows containing NaN s (non numeric entries):-
df = df[(~df.isnull()).all(axis=1)]
This can also be simplified for cases like: Delete all rows where column E is negative
df = df[(df.E>0)]
I would like to end with some profiling stats on why @User's drop
solution is slower than raw column based filtration:-
%timeit df_new = df[(df.E>0)]
345 µs ± 10.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit dft.drop(dft[dft.E < 0].index, inplace=True)
890 µs ± 94.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
A column is basically a Series
i.e a NumPy
array, it can be indexed without any cost. For folks interested in how the underlying memory organization plays into execution speed here is a great Link on Speeding up Pandas: