This solution is more hackish in terms of implementation, but I find it much cleaner in terms of usage, and it is certainly more general than the others proposed.
https://github.com/toobaz/generic_utils/blob/master/generic_utils/pandas/where.py
You don't need to download the entire repo: saving the file and doing
from where import where as W
should suffice. Then you use it like this:
df = pd.DataFrame([[1, 2, True],
[3, 4, False],
[5, 7, True]],
index=range(3), columns=['a', 'b', 'c'])
# On specific column:
print(df.loc[W['a'] > 2])
print(df.loc[-W['a'] == W['b']])
print(df.loc[~W['c']])
# On entire - or subset of a - DataFrame:
print(df.loc[W.sum(axis=1) > 3])
print(df.loc[W[['a', 'b']].diff(axis=1)['b'] > 1])
A slightly less stupid usage example:
data = pd.read_csv('ugly_db.csv').loc[~(W == '$null$').any(axis=1)]
By the way: even in the case in which you are just using boolean cols,
df.loc[W['cond1']].loc[W['cond2']]
can be much more efficient than
df.loc[W['cond1'] & W['cond2']]
because it evaluates cond2
only where cond1
is True
.
DISCLAIMER: I first gave this answer elsewhere because I hadn't seen this.
df.query
andpd.eval
seem like good fits for this use case. For information on thepd.eval()
family of functions, their features and use cases, please visit Dynamic Expression Evaluation in pandas using pd.eval(). - cs95