0
votes

I have a dataframe as follows

col1  col2  col3 col4

 1      1     1    1     
 1      1     1    2     
 1      1     1    3     
 1      1     2    1     
 1      1     2    2     
 1      1     3    1     
 1      1     4    2     

x = data.frame(
Col1 = c(1,1,1,1,1,1,1), 
Col2 = c(1,1,1,1,1,1,1), 
Col3 = c(1,2,3,1,2,4,1), 
Col4 = c(1,1,1,2,2,2,3), 
)

I have to create an additional column where, if the col1, col2, and col3 have the same value but col4 has a different value, for all the rows that has different col4, the new column col5 (which is by default 0) should be incremented by 1. For example (Notice rows 1,4 & 7)

col1  col2  col3 col4  col5

 1      1     1    1     2
 1      1     1    2     2
 1      1     1    3     2
 1      1     2    1     1
 1      1     2    2     1
 1      1     3    1     0
 1      1     4    2     0

Is there a way to do that using r command?

2
I struggle to understand your rules for col5. You don't explain what happens when col1, col2, col3, col4 all have the same value. What does " for all those matching rows of col4 that is distinct, col5 should be incremented by 1, starting from 0" mean? Rows matching what? Being distinct how?Maurits Evers
Hi, I edited the question, hope it clears up the uncertainty in my requirementVisahan

2 Answers

3
votes

I think you are looking for

library(dplyr)

x %>%
  group_by(Col1, Col2, Col3) %>%
  mutate(col5 = n_distinct(Col4) - 1)

#  Col1  Col2  Col3  Col4  col5
#  <dbl> <dbl> <dbl> <dbl> <dbl>
#1     1     1     1     1     2
#2     1     1     2     1     1
#3     1     1     3     1     0
#4     1     1     1     2     2
#5     1     1     2     2     1
#6     1     1     4     2     0
#7     1     1     1     3     2

This counts number of unique values in each group starting from 0 (by subtracting 1).


which in base R would be

with(df, ave(Col4, Col1, Col2, Col3, FUN = function(x) length(unique(x))) - 1)
#[1] 2 1 0 2 1 0 2
0
votes

Using data.table

library(data.table)
setDT(x)[, Col5 := uniqueN(Col4) - 1, by = Col1:Col3][]
#   Col1 Col2 Col3 Col4 Col5
#1:    1    1    1    1    2
#2:    1    1    2    1    1
#3:    1    1    3    1    0
#4:    1    1    1    2    2
#5:    1    1    2    2    1
#6:    1    1    4    2    0
#7:    1    1    1    3    2