0
votes

I have two data frames and have two ids columns, now i want to validate if id1 in df1 available in df2 id2 the mutate new column to duplicate exist but my code is not working what is am doing wrong ..??

also i am giving user a input parameter to give column names to be validated

df1 <-  data.frame(ID1= c("EMT1","EMT2","EMT3","EMT4","EMT5","EMT6","EMT7","EMT8","EMT9","EMT10","EMT11","EMT12","EMT13","EMT14","EMT15","EMT16","EMT17","EMT18","EMT19","EMT20","EMT21","EMT22","EMT23","EMT24","EMT25","EMT25","EMT27"))
df2 <-  data.frame(ID2= c("EMT10","EMT10","EMT10","EMT8","EMT8","EMT8","EMT6","EMT10","EMT6","","EMT6","EMT6","EMT5","EMT5","EMT5","EMT5","EMT5","EMT5","EMT5","EMT4","EMT4","EMT4","EMT4","EMT23","EMT32","EMT241","EMT51"))

empid_new = "ID1"
empid_old = "ID2"


uniqu_emp <- df2 %>% select(empid_old) %>% distinct()
df1 <- df1 %>% mutate(`dupe id` = ifelse((df1[[empid_new]] %in% uniqu_emp)== TRUE, "duplicate exist",""))


1

1 Answers

0
votes

Since you are using dplyr you can refer the dataframe with .data.

distinct returns a dataframe to compare with %in% you need a vector.

library(dplyr)
uniqu_emp <- df2 %>% distinct(.data[[empid_old]]) %>% pull()

df1 %>% 
   mutate(`dupe id` = ifelse(.data[[empid_new]] %in% uniqu_emp, 
                       "duplicate exist",""))

#     ID1         dupe id
#1   EMT1                
#2   EMT2                
#3   EMT3                
#4   EMT4 duplicate exist
#5   EMT5 duplicate exist
#6   EMT6 duplicate exist
#7   EMT7                
#8   EMT8 duplicate exist
#9   EMT9                
#10 EMT10 duplicate exist
#11 EMT11                
#12 EMT12   
#....             
#....