Record count of unique child columns in R

Question

I have a dataframe as follows

col1  col2  col3 col4

 1      1     1    1     
 1      1     1    2     
 1      1     1    3     
 1      1     2    1     
 1      1     2    2     
 1      1     3    1     
 1      1     4    2     

x = data.frame(
Col1 = c(1,1,1,1,1,1,1), 
Col2 = c(1,1,1,1,1,1,1), 
Col3 = c(1,2,3,1,2,4,1), 
Col4 = c(1,1,1,2,2,2,3), 
)

I have to create an additional column where, if the col1, col2, and col3 have the same value but col4 has a different value, for all the rows that has different col4, the new column col5 (which is by default 0) should be incremented by 1. For example (Notice rows 1,4 & 7)

col1  col2  col3 col4  col5

 1      1     1    1     2
 1      1     1    2     2
 1      1     1    3     2
 1      1     2    1     1
 1      1     2    2     1
 1      1     3    1     0
 1      1     4    2     0

Is there a way to do that using r command?

I struggle to understand your rules for col5. You don't explain what happens when col1, col2, col3, col4 all have the same value. What does " for all those matching rows of col4 that is distinct, col5 should be incremented by 1, starting from 0" mean? Rows matching what? Being distinct how? — Maurits Evers
Hi, I edited the question, hope it clears up the uncertainty in my requirement — Visahan

Ronak Shah Ronak Shah · Accepted Answer · 2019-03-25T04:49:34

I think you are looking for

library(dplyr)

x %>%
  group_by(Col1, Col2, Col3) %>%
  mutate(col5 = n_distinct(Col4) - 1)

#  Col1  Col2  Col3  Col4  col5
#  <dbl> <dbl> <dbl> <dbl> <dbl>
#1     1     1     1     1     2
#2     1     1     2     1     1
#3     1     1     3     1     0
#4     1     1     1     2     2
#5     1     1     2     2     1
#6     1     1     4     2     0
#7     1     1     1     3     2

This counts number of unique values in each group starting from 0 (by subtracting 1).

which in base R would be

with(df, ave(Col4, Col1, Col2, Col3, FUN = function(x) length(unique(x))) - 1)
#[1] 2 1 0 2 1 0 2

Record count of unique child columns in R

2 Answers