Pandas filter dataframe for positive and negative values

Question

I have a pandas dataframe with 3 columns where:

Category dtype - string
Date dtype - datetime

Values dtype - float

df = pd.DataFrame()
df['category'] = ['a', 'b', 'b', 'b', 'c', 'a', 'b', 'c', 'c', 'a']
df['date'] = ['2018-01-01', '2018-01-01', '2018-01-03', '2018-01-05', '2018-01-01', '2018-01-02', '2018-01-06', '2018-01-03', '2018-01-04','2018-01-01']
df['values'] = [1, 2, -1.5, 2.3, 5, -0.7, -5.2, -5.2, 1, -1.1]
df

Dataframe view

I want to filter for rows that have a positive value and a negative value (with the least difference) close to that date per category.

So, essentially an output that looks like:

df = pd.DataFrame()
df['category'] = ['a', 'a','b', 'b', 'c', 'c']
df['date'] = ['2018-01-01', '2018-01-01', '2018-01-01', '2018-01-03', '2018-01-01', '2018-01-03']
df['values'] = [1, -1.1, 2, -1.5, 5, -5.2]
df

Filtered Dataframe

I have looked at similar queries on SO (Identifying closest value in a column for each filter using Pandas, How do I find the closest values in a Pandas series to an input number?)

The first one utilises idxmin, which returns first occurence, not the closest in value.

The second link is speaking about a specific value as an input - I don't think a pure np.argsort works in my case.

I can imagine using a complex web of if statements to do this, but, I'm not sure what the most efficient way of doing this is with pandas.

Any guidance would be greatly appreciated.

Welcome to StackOverflow. Please take the time to read this post on how to provide a great pandas example as well as how to provide a minimal, complete, and verifiable example and revise your question accordingly. These tips on how to ask a good question may also be useful. — jezrael

Scott Boston Scott Boston · Accepted Answer · 2018-09-17T13:03:59

IIUC, sort your dataframe first then use idxmin:

df1 = df.sort_values(['category','date'])
df1[df1.groupby('category')['values']\
       .transform(lambda x: x.index.isin([x.ge(0).idxmin(), x.lt(0).idxmin()]))]

Output:

  category        date  values
0        a  2018-01-01     1.0
9        a  2018-01-01    -1.1
1        b  2018-01-01     2.0
2        b  2018-01-03    -1.5
4        c  2018-01-01     5.0
7        c  2018-01-03    -5.2

Pandas filter dataframe for positive and negative values

1 Answers