1
votes

Wrangling data in R, I would like to mutate a tibble in such a way that the numerical values in the new column are being looked up in a different tibble.

Given a dataset of catheter removals:

# A tibble: 51 x 2
   ExplYear RemovalReason
      <dbl> <chr>        
 1     2018 Infection    
 2     2018 Dysfunction  
 3     2018 Infection    
# ... etc.

where each row corresponds to a single catheter removal, I would like to add a column Implants that holds the total number of _im_plantations in the year that the catheter was removed (_ex_planted).

The implantation numbers are in a tibble impl_per_year:

# A tibble: 13 x 2
   ImplYear     n
      <dbl> <int>
 1     2006    14
 2     2007    46
 3     2008    64
# ... etc.

I have tried to mutate the first tibble with map and a helper function:

lookup = function(year) { impl_per_year[impl_per_year$ImplYear == year,]$n }
explants %>% mutate(Implants = map(ExplYear, lookup)

But this places lots of empty integer vectors into the Implants column:

# A tibble: 51 x 3
   ExplYear RemovalReason Implants
      <dbl> <chr>         <list>     
 1       18 Infection     <int [0]>  
 2       18 Dysfunction   <int [0]>  
 3       18 Infection     <int [0]>  
# ... etc.

What is the mistake?

2

2 Answers

4
votes

You should be able to simply join the two tables by year. If we call your first tibble ExplTibble and your second ImplTibble, using dplyr:

ExplTibble %>% left_join(ImplTibble, by = c("ExplYear" = "ImplYear"))

This should add a new column n containing the number of implants in each year.

0
votes

library(tidyverse)

I altered your data so that my illustration wouldn't have a NULL output.

df <- tribble(
~ExplYear, ~RemovalReason,
2018, "Infection",   
2017, "Dysfunction",
2016, "Infection")

impl_per_year <- tribble(
  ~ImplYear, ~n,
  2017, 14,
  2016, 46,
  2016, 64
)

left_join is the function you're looking for. It's part of the dplyr::join family of functions that do this.

It's good to have the same names for "joining" variables, but in your case you need the by = c( ... ) option to let left_join know what you are joining by.

left_join(df, impl_per_year, by = c("ExplYear" = "ImplYear"))

# A tibble: 4 x 3
  ExplYear RemovalReason     n
     <dbl> <chr>         <dbl>
1     2018 Infection        NA
2     2017 Dysfunction      14
3     2016 Infection        46
4     2016 Infection        64

Depending on what you want, consider right_join, inner_join, etc. until you get the output you are looking for. For example:

inner_join(df, impl_per_year, by = c("ExplYear" = "ImplYear"))

# A tibble: 3 x 3
  ExplYear RemovalReason     n
     <dbl> <chr>         <dbl>
1     2017 Dysfunction      14
2     2016 Infection        46
3     2016 Infection        64

... which gives only successful matches from both tibbles.