I'm trying to create a new column in a dataframe that computes something based on "matched" variables in another dataframe.
I know I can do this with apply
, but it is the only workflow I haven't figured out how to perform in dplyr
, and I'd like to find a tidy solution.
I've created a data frame df1
with groups (random animals), years, and some statistic. I'd like to mutate a new column in this dataframe that calculates something using that statistic and the output dataframe from a set of linear models (df2
).
# generate reprex df1
year <- seq(1990, 2010, 1)
group <- c("giraffe","rhino","flamingo","shark","turtle")
df1 <- expand.grid(year, group) %>%
rename(year = Var1,
group = Var2) %>%
mutate(stat = runif(length(year),37,44))
# generate reprex df2
term <- c("(Intercept)","y")
df2 <- expand.grid(year, term) %>%
rename(year = Var1,
term = Var2) %>%
mutate(estimate = ifelse(term=="y", -runif(n(), 1.1, 1.3), runif(n(), 50, 60)))
I want to feed the stat
column from df1
into the equation for each year provided by df2
using the equation for a line (y=ax+b
). Note that both the intercept and the slope vary each year in df2
. Conceptually, this would look like
mutate(result = df1$stat*df2$estimate[estimate=="y" and year==year of df1] + df2$estimate[estimate=="(Intercept)" and year==year of df1]
.
For example, for giraffes in 1993, this would be 39.58*(-1.23)+50.14=1.45 (left off some decimal places for readability).
How can I actually do this with dplyr?
set.seed(0)
at the top if you want us to see the same random numbers fromrunif
or other random functions. i.e. my 1993 giraffes is not 39.58.... – Jon Spring