How to identify duplicate items within a subset of data

Question

I am trying to identify which trials, within a long form dataset, are repeated but only within certain blocks per participant. My data is structured something like this:

sub  block  trial  item
1    1      1      A
1    1      2      B
1    2      1      A
1    2      2      B
1    3      1      B
1    3      2      C
2    1      1      A
2    1      2      B
2    2      1      A
2    2      2      B
2    3      1      B
2    3      2      C

What I would like to create is a new column that indicates for each participant, which items are repeating and another new column with a new trial code, but only if the items are repeated in blocks 2 and 3. So it would look something like this:

sub  block  trial  item   dup      newtrial
1    1      1      A      FALSE    1
1    1      2      B      FALSE    2
1    2      1      A      FALSE    1
1    2      2      B      FALSE    2
1    3      1      C      FALSE    1
1    3      2      B      TRUE     102
2    1      1      A      FALSE    1
2    1      2      B      FALSE    2
2    2      1      A      FALSE    1
2    2      2      B      FALSE    2
2    3      1      C      FALSE    1
2    3      2      B      TRUE     102

I have been able to identify duplicates across the whole dataset and add 100 to each trial number using the following code:

data$dup<-duplicated(data$item)
data$newtrial<-NA

data<-transform(data,
item=make.unique(as.character(item)),
newtrial=ifelse(duplicated(item),trial+100, trial))

What I have not been able to figure out is how to constrain the function to each individual subject and only certain blocks within each subject number.

Thanks!

Your desired output does not seem to match your input. Why are those labelled as dup=TRUE duplicates within its sub and block? — aichao

agstudy agstudy · Accepted Answer · 2016-10-26T17:55:58

another option using data.table:

library(data.table)
xt <- fread("sub  block  trial  item
1    1      1      A
1    1      2      B
1    2      1      A
1    2      2      B
1    3      1      B
1    3      2      B
2    1      1      A
2    1      2      B
2    2      1      A
2    2      2      B
2    3      1      B
2    3      2      B")

xt[,
   c("dup","ntrial") := {
     dup <- duplicated(item)
     tt <- ifelse(dup,trial+100L,trial)
     list(dup,tt)
   },"sub,block"]

How to identify duplicate items within a subset of data

2 Answers