Pandas how to get the first and last dates of all non-zero value ranges in a data frame?

Question

I have a data frame that looks as follows:

df = pd.DataFrame({"A":[10,0,30,40,0,60,70,80,90]}, index = pd.date_range(start='1/1/2020', end='1/09/2020'))
df


            A
2020-01-01  10
2020-01-02  0
2020-01-03  30
2020-01-04  40
2020-01-05  0
2020-01-06  60
2020-01-07  70
2020-01-08  80
2020-01-09  90

I want to loop over the data frame to get t0 and t1 which represent the first and last dates, respectively (i.e. index value) for each range of non-zero values of A.

In the table above, I want to get the following values for t0 and t1:

t0 = 2020-01-01 , t1 = 2020-01-01

t0 = 2020-01-03 , t1 = 2020-01-04

t0 = 2020-01-06 , t1 = 2020-01-09

Is there a simple way to do this within Pandas?

Henry Ecker Henry Ecker · Accepted Answer · 2021-06-14T22:38:39

One option would be to create a Boolean Index based on where A=0. Then groupby the cumsum of that Index and aggregate the first and last dates (assumes index is sorted as it is here):

new_df = df.reset_index()
m = new_df['A'].eq(0)
new_df = (
    new_df.groupby(m.cumsum()[~m])
        .agg(t0=('index', 'first'), t1=('index', 'last'))
        .reset_index(drop=True)
)

          t0         t1
0 2020-01-01 2020-01-01
1 2020-01-03 2020-01-04
2 2020-01-06 2020-01-09

An alternative to grab min and max index values instead of first and last:

new_df = df.reset_index()
m = new_df['A'].eq(0)
new_df = (
    new_df.groupby(m.cumsum()[~m])
        .agg(t0=('index', 'min'), t1=('index', 'max'))
        .reset_index(drop=True)
)

          t0         t1
0 2020-01-01 2020-01-01
1 2020-01-03 2020-01-04
2 2020-01-06 2020-01-09

Indexing:

m:

0    False
1     True
2    False
3    False
4     True
5    False
6    False
7    False
8    False
Name: A, dtype: bool

Create Groups Separated by where 0s are:

m.cumsum()

0    0
1    1
2    1
3    1
4    2
5    2
6    2
7    2
8    2
Name: A, dtype: int32

Self filtered to exclude 0 rows:

m.cumsum()[~m]

0    0
2    1
3    1
5    2
6    2
7    2
8    2
Name: A, dtype: int32

Then group these rows together to determine associated dates.

Pandas how to get the first and last dates of all non-zero value ranges in a data frame?

2 Answers