Check if all conditions are True by group based on start date in pandas

Question

I have a dataframe which looks like this:

pd.DataFrame({'a': ['cust1', 'cust1', 'cust1', 'cust2', 'cust2', 'cust3', 'cust3', 'cust3', 'cust4', 'cust4'],
                   'year': [2017, 2018, 2019, 2018, 2019, 2017, 2018, 2019, 2018, 2019],
                   'cond': [True, True, False, True, True, True, True, True, True, True],
                   'startDate': [2017, 2017, 2017, 2018, 2018, 2017, 2017, 2017, 2017, 2017]})

endYear = 2019

       a  year   cond  startDate
0  cust1  2017   True       2017
1  cust1  2018   True       2017
2  cust1  2019  False       2017
3  cust2  2018   True       2018
4  cust2  2019   True       2018
5  cust3  2017   True       2017
6  cust3  2018   True       2017
7  cust3  2019   True       2017
8  cust4  2018   True       2017
9  cust4  2019   True       2017

For each group in column 'a' I need to check if column 'cond' has "True" value for all years between column 'startDate' (can be different for each group) and 'endYear'.

My resulting dataframe should look like this:

       a  final_score
0  cust1        False
1  cust2         True
2  cust3         True
3  cust4        False

Logic:

cust1 = False as year 2019 value = False

cust2 = True as startDate 2018 and for all years between 2018 and 2019 'cond' is True

cust3 = True as startDate 2017 and for all years between 2017 and 2019 'cond' is True

cust4 = False as startDate 2017 but no data for 2017 so condition not met

Henry Yik Henry Yik · Accepted Answer · 2020-07-26T12:23:14

IIUC, you could use reindex to fill up the empty years by group then check for True:

print (df.groupby("a").apply(lambda d: d.set_index("year").reindex(range(d["startDate"].min(), endYear+1))["cond"].eq(True).all()))

a
cust1    False
cust2     True
cust3     True
cust4    False
dtype: bool

Check if all conditions are True by group based on start date in pandas

2 Answers