How to vectorize a function that uses both row and column elements of a dataframe

Question

I have two inputs in a dataframe, and I need to create an output that depends on both inputs (same row, different columns), but also on its previous value (same column, previous row).

This dataframe command will create an example of what I need:

df=pd.DataFrame([[0,0,0], [0,1,0], [0,0,0], [1,1,1], [0,1,1], [0,1,1], [0,0,0], [0,1,0], [0,1,0], [1,1,1], [1,1,1], [0,1,1], [0,1,1], [1,1,1], [0,1,1], [0,1,1], [0,0,0], [0,1,0]], columns=['input_1', 'input_2', 'output'])

The rules are simple:

If input_1 is 1, output is 1 (input_1 is a trigger function)
output will remain as 1 as long as input_2 is also 1. (input_2 works kind of like a memory function)
For all the others, output will be 0

The rows go in sequence as they happen in time, I mean, row 0 output influences row 1 output, row 1 output influences row 2 output, and so on. So output depends on input_1, input_2, but also on its own previous value.

I could code it looping through the dataframe, computing and assigning values using iloc, but it is painfully slow. I need to run this through many thousands of rows for tens of thousands of dataframes, so I am looking for the most efficient way to do it (preferably vectorization). It can be with numpy or other library/method that you know.

I searched and found some questions about vectorization and row-looping, but I still don't see how to use those techniques. Example questions: How to iterate over rows in a DataFrame in Pandas?. Also this one, What is the most efficient way to loop through dataframes with pandas?

I appreciate your help

Rule number 4 conflicts with the original statement. You say that the output depends on inputs #1 and #2 and on its previous value. In rule #4 you say that the output also depends on the previous value of input #2. Please specify which of the statements is correct. — Sergey
Thanks for clarification. In this case, rule #4 is unnecessary. Try to write the conditions in the form of a number of three bits (input #1, input#2, prev. output). Let's translate your rules for the language of numbers. Rule #1: output will be 1 if the input is more than 3 (combinations 100 101 110 111). Rule #2 output will be 1 if the input is 3 (011). Rule #3: output is zero in the remaining cases, that is, if the input number is less than 3 (combinations 000 001 010). Rule #4 says that if the input number is 2 (010), then the output will be 0, but we already know this from rule #3 — Sergey
Hi @Sergey. You have a point in translating the rules into numbers. However "previous output" belongs to a different row, and that is exactly what I want to overcome in order to vectorize the solution. I don't want to put it in an horizontal rule, because that is not what I have. I will delete rule #4, as it is repeating something already said. Thanks!!! — xiaxio
Hi @xiaxio, did I understand correctly that you initially only have two columns and zero as the initial output? — Sergey
Hi @Sergey. I have 'input_1' and 'input_2' columns. I need to generate the 'output' column. I cannot use it as you did in your answer. — xiaxio

Andrej Kesely Andrej Kesely · Accepted Answer · 2020-01-19T16:05:17

If I understand you right, you want to know how to compute column output. You can do for example:

df['output_2'] = (df['input_1'] + df['input_2']).replace(1, np.nan).ffill().replace(2, 1).astype(int)
print(df)

Prints:

    input_1  input_2  output  output_2
0         0        0       0         0
1         0        1       0         0
2         0        0       0         0
3         1        1       1         1
4         0        1       1         1
5         0        1       1         1
6         0        0       0         0
7         0        1       0         0
8         0        1       0         0
9         1        1       1         1
10        1        1       1         1
11        0        1       1         1
12        0        1       1         1
13        1        1       1         1
14        0        1       1         1
15        0        1       1         1
16        0        0       0         0
17        0        1       0         0

How to vectorize a function that uses both row and column elements of a dataframe

2 Answers