I have a pandas dataframe with 3 columns where:
- Category dtype - string
- Date dtype - datetime
Values dtype - float
df = pd.DataFrame() df['category'] = ['a', 'b', 'b', 'b', 'c', 'a', 'b', 'c', 'c', 'a'] df['date'] = ['2018-01-01', '2018-01-01', '2018-01-03', '2018-01-05', '2018-01-01', '2018-01-02', '2018-01-06', '2018-01-03', '2018-01-04','2018-01-01'] df['values'] = [1, 2, -1.5, 2.3, 5, -0.7, -5.2, -5.2, 1, -1.1] df
I want to filter for rows that have a positive value and a negative value (with the least difference) close to that date per category.
So, essentially an output that looks like:
df = pd.DataFrame()
df['category'] = ['a', 'a','b', 'b', 'c', 'c']
df['date'] = ['2018-01-01', '2018-01-01', '2018-01-01', '2018-01-03', '2018-01-01', '2018-01-03']
df['values'] = [1, -1.1, 2, -1.5, 5, -5.2]
df
I have looked at similar queries on SO (Identifying closest value in a column for each filter using Pandas, How do I find the closest values in a Pandas series to an input number?)
The first one utilises idxmin
, which returns first occurence, not the closest in value.
The second link is speaking about a specific value as an input - I don't think a pure np.argsort
works in my case.
I can imagine using a complex web of if statements to do this, but, I'm not sure what the most efficient way of doing this is with pandas.
Any guidance would be greatly appreciated.