There are at least a few approaches to shortening the syntax for this in Pandas, until it gets a full query API down the road (perhaps I'll try to join the github project and do this is time permits and if no one else already has started).
One method to shorten the syntax a little is given below:
inds = df.apply(lambda x: x["A"]>10 and x["B"]<5, axis=1)
print df[inds].to_string()
To fully solve this, one would need to build something like the SQL select and where clauses into Pandas. This is not trivial at all, but one stab that I think might work for this is to use the Python operator
built-in module. This allows you to treat things like greater-than as functions instead of symbols. So you could do the following:
def pandas_select(dataframe, select_dict):
inds = dataframe.apply(lambda x: reduce(lambda v1,v2: v1 and v2,
[elem[0](x[key], elem[1])
for key,elem in select_dict.iteritems()]), axis=1)
return dataframe[inds]
Then a test example like yours would be to do the following:
import operator
select_dict = {
"A":(operator.gt,10),
"B":(operator.lt,5)
}
print pandas_select(df, select_dict).to_string()
You can shorten the syntax even further by either building in more arguments to pandas_select
to handle the different common logical operators automatically, or by importing them into the namespace with shorter names.
Note that the pandas_select
function above only works with logical-and chains of constraints. You'd have to modify it to get different logical behavior. Or use not
and DeMorgan's Laws.