R: removing rows and replacing values using conditions from multiple columns

Question

I want to filter out all values of var3 < 5 while keeping at least one occurrence of each value of var1.

> foo <- data.frame(var1=c(1, 1, 8, 8, 5, 5, 5), var2=c(1,2,3,2,4,6,8), var3=c(7,1,1,1,1,1,6))
> foo
  var1 var2 var3
1    1    1    7
2    1    2    1
3    8    3    1
4    8    2    1
5    5    4    1
6    5    6    1
7    5    8    6

subset(foo, (foo$var3>=5)) would remove row 2 to 6 and I would have lost var1==8.

I want to remove the row if there is another value of var1 that fulfills the condition foo$var3 >= 5. See row 5.
I want to keep the row, assiging NA to var2 and var3 if all occurrences of a value var1 do not fulfill the condition foo$var3 >= 5.

This is the result I expect:

  var1 var2 var3
1    1    1    7
3    8   NA   NA
7    5    8    6

This is the closest I got:

> foo$var3[ foo$var3 < 5 ] = NA
> foo$var2[ is.na(foo$var3) ] = NA
> foo
  var1 var2 var3
1    1    1    7
2    1   NA   NA
3    8   NA   NA
4    8   NA   NA
5    5   NA   NA
6    5   NA   NA
7    5    8    6

Now I just need to know how to conditionally remove the right rows (2, 3 or 4, 5, 6): Remove the row if var2 & var3 are NA and if the value of var1 has more than 1 occurrence.

But there is surely a much simpler/elegant way to approach this little problem.

edit: changed foo to resemble my use case more

Joris Meys Joris Meys · Accepted Answer · 2011-01-16T12:31:41

The fastest way is to use merge:

> merge(foo[foo$var3>5,],unique(foo$var1),by.x=1,by.y=1,all.y=T)
  var1 var2 var3
1    1    1    7
2    5    8    6
3    8   NA   NA

unique(foo$var1) gives the unique values in var1. These ones are mapped against the dataframe where var3 is larger than five. You take the first column of every argument (all.x=1, all.y=1) and you say that all values in y should be represented (all.y=T). See also ?merge.

If you want to preserve the order, then :

> merge(foo[foo$var3>5,],unique(foo$var1),by.x=1,by.y=1,
+ all.y=T)[order(unique(foo$var1)),]
  var1 var2 var3
1    1    1    7
3    8   NA   NA
2    5    8    6

merge sorts the variable on which the mapping happens. order gives this sorting, so you can reverse it using that order as indices. See also ?order.

R: removing rows and replacing values using conditions from multiple columns

5 Answers