I want to filter out all values of var3 < 5 while keeping at least one occurrence of each value of var1.
> foo <- data.frame(var1=c(1, 1, 8, 8, 5, 5, 5), var2=c(1,2,3,2,4,6,8), var3=c(7,1,1,1,1,1,6))
> foo
var1 var2 var3
1 1 1 7
2 1 2 1
3 8 3 1
4 8 2 1
5 5 4 1
6 5 6 1
7 5 8 6
subset(foo, (foo$var3>=5))
would remove row 2 to 6 and I would have lost var1==8.
- I want to remove the row if there is another value of var1 that fulfills the condition foo$var3 >= 5. See row 5.
- I want to keep the row, assiging NA to var2 and var3 if all occurrences of a value var1 do not fulfill the condition foo$var3 >= 5.
This is the result I expect:
var1 var2 var3
1 1 1 7
3 8 NA NA
7 5 8 6
This is the closest I got:
> foo$var3[ foo$var3 < 5 ] = NA
> foo$var2[ is.na(foo$var3) ] = NA
> foo
var1 var2 var3
1 1 1 7
2 1 NA NA
3 8 NA NA
4 8 NA NA
5 5 NA NA
6 5 NA NA
7 5 8 6
Now I just need to know how to conditionally remove the right rows (2, 3 or 4, 5, 6): Remove the row if var2 & var3 are NA and if the value of var1 has more than 1 occurrence.
But there is surely a much simpler/elegant way to approach this little problem.
edit: changed foo
to resemble my use case more