Suppose I have a data frame with a few categorical variables and some columns that are string values. I would like to create a new column that, for each row, pastes string values from other rows if certain values in the categorical columns match (or don't match). Here is a toy example.
toy <- data.frame("id" = c(1,2,3,2), "year" = c(2000,2000,2004,2004), "words" = c("a b", "c d", "e b", "c d"))
I would like to create a variable word_pool
that is pasted from other rows' words
column if two criteria are met: the row's id
value is different from the current row's id value and the row's year
value is less than the current row's year value.
What should result is
id year words word_pool
1 2000 a b
2 2000 c d
3 2004 e b a b c d
2 2004 c d a b
The first two rows will be blank for the new column since there isn't a year less than 2000 in the toy example. The last row will only have "a b" as the value in the new column since its id
is repeated.
I've tried various apply
and group_by
approaches but none seem to fit the bill exactly. Would appreciate any and all ideas!