How to remove duplicate values of only one column element based on another column

Question

I have data where I have one element of interest (111) in a column. My data looks something like this:

pcp2 <- data.frame(A = c(rep(111, 4), rep(222, 5), rep(111,5), 
        rep(222,5)),B= c(rep(1,9), rep(2,10))) 

     A  B
1   111 1
2   111 1
3   111 1
4   111 1
5   222 1
6   222 1
7   222 1
8   222 1
9   222 1
10  111 2
11  111 2
12  111 2
13  111 2
14  111 2
15  222 2
16  222 2
17  222 2
18  222 2
19  222 2

I want to collapse all of only variable 222 and leave all 111 untouched throughout my data based upon the IDs in column B like so:

All I have been able to find that is close reduces all the variables (both 111 and 222).

library(data.table)
dat <- as.data.table(pcp2, key = "B")
data <- dat[, by = key(dat)][!duplicated(A == "222")]

as follows:

   A    B
1 111   1
2 222   2

I've played around with various iterations of this code and others, but other things I've tried as well have just gotten my data down to two rows or collapsed both 111 and 222. I.e. this wouldn't be sufficient:

Does anyone have any guidance for how you would maintain the 111 and collapse the 222 within one column based on another column like the example (B in this case)? I know other examples are similar but none seem to give the option to specify not collapsing a particular variable within one column while collapsing the other(s).

if you change pcp2[7, 2] = 2 - do you expect to get 3 rows with A = 222 at the end, or just 2? — eddi
Just 2. I would expect one value of 222 PER each unique identifier in column B, no more though. So for this one 222 for when column B is 1 and one 222 for when column B is 2. — cebola

Frank Frank · Accepted Answer · 2017-11-17T16:49:29

You can add a counter for the uncollapsed rows:

dat[, r := rowid(A)*(A == 111)]
unique(dat, by=c("B","r"))[, !"r"]

      A B
 1: 111 1
 2: 111 1
 3: 111 1
 4: 111 1
 5: 222 1
 6: 111 2
 7: 111 2
 8: 111 2
 9: 111 2
10: 111 2
11: 222 2

(This requires data.table, while Gregor's answer works in base R as well.)

How to remove duplicate values of only one column element based on another column

3 Answers