I'm trying to find a vectorized/fast/numpy friendly way to convert the following values in column A, to column B:
ID A B
1 0 0
2 0 0
3 1 0
4 1 1
5 0 1
6 0 1
7 -1 1
8 0 0
9 1 0
10 0 1
11 0 1
12 1 1
13 0 1
14 -1 1
15 0 0
The algorithm to define column 'B' would be fill all gaps between groups of 1 and -1 with the value of 1, skipping the first row in each pair. That is, for ID4-ID7, column B is filled with ones (given the initial 1 in column A @ ID3). Next, from ID10-ID14 is filled with ones (since column A @ ID9 =1).
While this is easy to do with a for loop, I'm wondering if a non-loop solution exists? An O(n) loop based solution is below:
import numpy as np
import pandas as pd
x = np.array([ 0, 0, 1, 1, 0 ,0, -1, 0, 1, 0 , 0, 1, 0, -1, 0])
def make_y(x,showminus=False):
y = x * 0
state = 0 # are we in 1 or 0 or -1
for i,n in enumerate(x):
if n == 1 and n != state:
state = n
if i < len(y)-1:
y[i+1] = state
elif n == -1 and n != state:
y[i] = state
if showminus:
state = -1
else:
state = 0
else:
y[i] = state
return y
y = make_y(x)
print pd.DataFrame([x,y]).T
The above function yields the following performance on my machine:
%timeit y = make_y(x)
10000 loops, best of 3: 28 µs per loop
I'm guessing there must be some way to make the whole thing faster, as I will eventually need to deal with arrays that are 10million+ elements long...
make_y
loop function that there is a parameter to also keep track of the -1 regions as well. I left that part out of the question, in order to simplify things (initially). – bazelmask = df.loc[(df['A'].shift() == 1) | (df['A']==-1)]
then collapse this again usingmask.loc[(mask['A'] == -1) | (mask['A'].shift(-1) != -1)]
which should then display the start and end indices and then iterate over or pull the indices into a list of tuple pairs where the pair has beg,end and set these to 1. – EdChum