1
votes

I have two dataframes, A and B. The first dataframe contains years, groupnames, and names. The second dataframe records the complete group lists. I want to create a variable in A dataframe. If a name in A dataframe appear in the B's year/group list, it should be coded "Y", otherwise "N".

A dataframe and B dataframe are:

A <- data.frame(year = c("2000", "2000", "2000", "2000", "2002", "2002", "2003", "2003", "2003"), group = c("Star", "Star", "Sun", "Sun", "Mars", "Earth", "Earth", "Star", "Star"), name = c("John", "Bill", "Summer", "Evans", "Ben", "Mary", "Kally", "John", "Carl"))
B <- data.frame(year = c("2000", "2000", "2000", "2000", "2000", "2002", "2002","2002", "2003", "2003", "2003", "2003", "2003"), group = c("Star", "Star", "Star", "Sun", "Sun", "Mars", "Mars","Earth", "Earth", "Star", "Star", "Star", "Star"), namelist = c("John", "Helen", "Gray", "Summer", "Evans", "Kevin", "Ben", "Ring", "Steve", "Billy", "Carl", "Michel", "John"))

For example, in 2000, B dataframe shows that Star has John, Helen, and Gray. Therefore, because A dataframe's Star in 2000 has John and Carl, A dataframe's new variable's first two rows are "Y" and "N". The result should be like this:

    year    group   name    in_the_list
1   2000    Star    John    Y
2   2000    Star    Bill    N
3   2000    Sun     Summer  Y
4   2000    Sun     Evans   Y
5   2002    Mars    Ben     Y
6   2002    Earth   Mary    N
7   2003    Earth   Kally   N
8   2003    Star    John    Y
9   2003    Star    Carl    Y
3
may i ask why you must use dplyr::mutate rather than base functions? btw there is a typo (Evens rather than Evans) in your data.frame B - chinsoon12
If dplyr::mutate isn't necessary, then you can do: B$in_the_list <- "Y" ___ new <- merge(A,B,by.x=c("year","group","name"),by.y=c("year","group","namelist"),all.x=TRUE) ___ new$in_the_list <- ifelse(is.na(new$in_the_list),"N","Y") - winampman

3 Answers

2
votes

Here's a dplyr-centric answer using mutate to transform the triad of year, group, and name into a unique id variable in both A and B tables, then using a left_join to identify observations in A that share the same year/group/name.

library(dplyr)

A <- A %>% 
  mutate(ygn=paste0(year,group,name))

B <- B %>% 
  mutate(ygn=paste0(year,group,namelist)) %>% 
  mutate(inthelist="Y") %>% 
  select(ygn, inthelist)

Newtbl <- left_join(A,B, by="ygn") %>% 
  select(year,group,name,inthelist) %>% 
  mutate(inthelist=ifelse(is.na(inthelist),"N",inthelist))



Outputs:
  year group   name inthelist
1 2000  Star   John         Y
2 2000  Star   Bill         N
3 2000   Sun Summer         Y
4 2000   Sun  Evans         Y
5 2002  Mars    Ben         Y
6 2002 Earth   Mary         N
7 2003 Earth  Kally         N
8 2003  Star   John         Y
9 2003  Star   Carl         Y
1
votes

paste the columns to form IDs and check if it exists in B

using dplyr package:

mutate(A, in_the_list=c("N", "Y")[paste(year,group,name) %in% Reduce(paste, B) + 1])

using base functions:

A$in_the_list <- c("N", "Y")[Reduce(paste, A) %in% Reduce(paste, B) + 1]
0
votes

You can leverage dplyr's semi_join, which returns the rows of one data.frame that are in another:

library(dplyr)

full_join(A, semi_join(A, rename(B, name = namelist)) %>% 
              mutate(in_the_list = TRUE)) %>% 
    mutate(in_the_list = !is.na(in_the_list))

#   year group   name in_the_list
# 1 2000  Star   John        TRUE
# 2 2000  Star   Bill       FALSE
# 3 2000   Sun Summer       FALSE
# 4 2000   Sun  Evans       FALSE
# 5 2002  Mars    Ben        TRUE
# 6 2002 Earth   Mary       FALSE
# 7 2003 Earth  Kally       FALSE
# 8 2003  Star   John        TRUE
# 9 2003  Star   Carl       FALSE