2
votes

I have a data frame like this,

df
col1    col2
 A        2
 B        3
 C        1
 D        4
 E        6
 F        1
 G        2
 H        8
 I        1
 J       10

Now I want to create another column col3 with grouping all the col2 values which are under below 5 and keep col3 values as 1 to number of groups, so the final data frame would look like,

col1    col2     col3
 A        2        1
 B        3        1
 C        1        1
 D        4        1
 E        6        2
 F        1        2
 G        2        2
 H        8        3
 I        1        3
 J       10        4

I could do this comparing the the prev values with the current values and store into a list and make it the col3.

But the execution time will be huge in this case, so looking for some shortcuts/pythonic way to do it most efficiently.

1

1 Answers

0
votes

Compare by Series.gt for > and then use Series.cumsum. New column always starts by 0, because first values of column is less like 5, else it should be 1:

df['col3'] = df['col2'].gt(5).cumsum()
print (df)
  col1  col2  col3
0    A     2     0
1    B     3     0
2    C     1     0
3    D     4     0
4    E     6     1
5    F     1     1
6    G     2     1
7    H     8     2
8    I     1     2
9    J    10     3

So for general solution starting by 1 use this trick - compare first values if less like 5, convert to integers for True->1 and False->0 and add to column:

N = 5
df['col3'] = df['col2'].gt(N).cumsum() + int(df.loc[0, 'col2'] < 5)

df = df.assign(col21 = df['col2'].add(pd.Series({0:5}), fill_value=0).astype(int))

N = 5
df['col3'] = df['col2'].gt(N).cumsum() + int(df.loc[0, 'col2'] < N)
#test for first value > 5
df['col31'] = df['col21'].gt(N).cumsum() + int(df.loc[0, 'col21'] < N)
print (df)
  col1  col2  col21  col3  col31
0    A     2      7     1      1
1    B     3      3     1      1
2    C     1      1     1      1
3    D     4      4     1      1
4    E     6      6     2      2
5    F     1      1     2      2
6    G     2      2     2      2
7    H     8      8     3      3
8    I     1      1     3      3
9    J    10     10     4      4