0
votes

I have a sample data set as below:

df <- data.frame(Group = c("a", "d", "a", "b", "b", "c", "c", "c", "c"), 
                 Year = c("1991", '1992', '1993', '1991', '1992', '1991', '1992','1993','1994'), 
                 value = 1:9)

I want to select rows that have the same group as 1991. For example, the groups are a, b, c in 1991 and d, b, c in 1992; therefore, group b, and c are selected for 1992. The groups are a and c in 1993; therefore, only a and c are selected in 1993. The resulting output is this

Group   Year
a   1991
b   1991
c   1991
b   1992
c   1992
a   1993
c   1993

This is what I tried:

df2 <- df %>% group_by(Group, Year) %>% 
  mutate(total = n()) %>% 
  filter(total == 3)

I can change total == 3 to total==2, but regardless, it filters observations so that each year have the same group, while I want to base the selection criteria only on 1991.

2

2 Answers

2
votes

Here's a way with dplyr -

df %>% 
  arrange(Year, Group) %>% # not necessary but nice to have I think
  filter(Group %in% Group[Year == 1991])

  Group Year value
1     a 1991     1
2     b 1991     4
3     c 1991     6
4     b 1992     5
5     c 1992     7
6     a 1993     3
7     c 1993     8
8     c 1994     9
0
votes

It is not entirely clear to me what's your desired output but I prefer a list of dataframes which I can later append together.

N.B. This is probably beyond what you need as the other answers are simply filtering rows based on values in a subset of Year column. 1

library(data.table)

lapply(split.data.frame(df, df$Year), 
       function(x) na.omit(setDT(x)[setDT(df[df$Year=='1991',]), , 
                     on=.(Group)]))

# $`1991`
#    Group Year value i.Year i.value
# 1:     a 1991     1   1991       1
# 2:     b 1991     4   1991       4
# 3:     c 1991     6   1991       6
#  
# $`1992`
#    Group Year value i.Year i.value
# 1:     b 1992     5   1991       4
# 2:     c 1992     7   1991       6
#  
# $`1993`
#    Group Year value i.Year i.value
# 1:     a 1993     3   1991       1
# 2:     c 1993     8   1991       6
#  
# $`1994`
#    Group Year value i.Year i.value
# 1:     c 1994     9   1991       6