I have two inputs in a dataframe, and I need to create an output that depends on both inputs (same row, different columns), but also on its previous value (same column, previous row).
This dataframe command will create an example of what I need:
df=pd.DataFrame([[0,0,0], [0,1,0], [0,0,0], [1,1,1], [0,1,1], [0,1,1], [0,0,0], [0,1,0], [0,1,0], [1,1,1], [1,1,1], [0,1,1], [0,1,1], [1,1,1], [0,1,1], [0,1,1], [0,0,0], [0,1,0]], columns=['input_1', 'input_2', 'output'])
The rules are simple:
- If input_1 is 1, output is 1 (input_1 is a trigger function)
- output will remain as 1 as long as input_2 is also 1. (input_2 works kind of like a memory function)
- For all the others, output will be 0
The rows go in sequence as they happen in time, I mean, row 0 output influences row 1 output, row 1 output influences row 2 output, and so on. So output depends on input_1, input_2, but also on its own previous value.
I could code it looping through the dataframe, computing and assigning values using iloc, but it is painfully slow. I need to run this through many thousands of rows for tens of thousands of dataframes, so I am looking for the most efficient way to do it (preferably vectorization). It can be with numpy or other library/method that you know.
I searched and found some questions about vectorization and row-looping, but I still don't see how to use those techniques. Example questions: How to iterate over rows in a DataFrame in Pandas?. Also this one, What is the most efficient way to loop through dataframes with pandas?
I appreciate your help