0
votes
library(tidyverse)

Using the sample data below, I want to use dplyr::distinct() based on a condition. I want to eliminate duplicates in the ID column, but only the duplicates with the lowest value of "Rate". For example, for "A1A1",the row with the rate of 2 should be deduped, while for "CC33", the rows with "rate" equal to 2 and 3 should be removed. I also want to end up with all columns by using dplyr::distinct with ".keep_all=TRUE".

I tried the code below, but this removes the Subject column.

DF2%>%group_by(ID)%>%summarise(Min_rate=min(Rate))

I also played around with a group_by, mutate, and if_else, but couldn't get it to work...

DF2%>%group_by(ID)%>%mutate(if_else(Rate=min(Rate),Rate,distinct(ID)

Help would be appreciated...

Sample Data:

ID<-c("A1A1","A22B","CC33","D33D","A1A1","4DD8","4DD8","CC33","CC33","56DK","F4G5","8Y0R")
Subject<-c("Subject1","Subject2","Subject3","Subject4","Subject5","Subject6","Subject7","Subject8","Subject9","Subject10","Subject11","Subject12")
Rate<-c(1,2,3,2,2,3,2,1,2,2,2,3)
DF2<-data_frame(ID,Subject,Rate)
1

1 Answers

0
votes

I found a way to accomplish what I want by first using dplyr's "group_by" and "mutate" functions together with "if_else" to recode the smallest value of the rate variable within each ID group with a 1, and all other values with a 0.

DF2<-DF2%>%group_by(ID)%>%mutate(Rate_Min=if_else(Rate==min(Rate),1,0))

I then use dplyr's "filter" to remove the 0's.

DF2<-DF2%>%filter(Rate_Min==1)