Referencing and indexing other dataframe in dplyr mutate workflow

Question

I'm trying to create a new column in a dataframe that computes something based on "matched" variables in another dataframe.

I know I can do this with apply, but it is the only workflow I haven't figured out how to perform in dplyr, and I'd like to find a tidy solution.

I've created a data frame df1 with groups (random animals), years, and some statistic. I'd like to mutate a new column in this dataframe that calculates something using that statistic and the output dataframe from a set of linear models (df2).

# generate reprex df1 
year <- seq(1990, 2010, 1)
group <- c("giraffe","rhino","flamingo","shark","turtle")
df1 <- expand.grid(year, group) %>% 
  rename(year = Var1,
         group = Var2) %>% 
  mutate(stat = runif(length(year),37,44))

# generate reprex df2 
term <- c("(Intercept)","y")
df2 <- expand.grid(year, term) %>% 
  rename(year = Var1,
         term = Var2) %>% 
  mutate(estimate = ifelse(term=="y", -runif(n(), 1.1, 1.3), runif(n(), 50, 60)))

I want to feed the stat column from df1 into the equation for each year provided by df2 using the equation for a line (y=ax+b). Note that both the intercept and the slope vary each year in df2. Conceptually, this would look like mutate(result = df1$stat*df2$estimate[estimate=="y" and year==year of df1] + df2$estimate[estimate=="(Intercept)" and year==year of df1].

For example, for giraffes in 1993, this would be 39.58*(-1.23)+50.14=1.45 (left off some decimal places for readability).

How can I actually do this with dplyr?

Good practice is to include set.seed(0) at the top if you want us to see the same random numbers from runif or other random functions. i.e. my 1993 giraffes is not 39.58.... — Jon Spring

Jon Spring Jon Spring · Accepted Answer · 2019-06-01T17:35:41

Easiest to approach in dplyr if you can first get all the terms you need for your calculation into the row where you need it. tidyr::spread will make y and (Intercept) available to the row where it's needed, and left_join lets you append the matching data from df1, in this case based on matching years (the column both tables share).

library(tidyverse)
df2 %>%
  spread(term, estimate) %>%
  left_join(df1) %>%
  mutate(result = stat * y + `(Intercept)`)

Referencing and indexing other dataframe in dplyr mutate workflow

1 Answers