2
votes

I would like to create a list column of matrices, where the entries of each matrix are elements from variables already present in the original dataset. My goal is to create 2 time 2 contingency tables for each row of the data set, and subsequently pass each matrix as an argument to fisher.test.

I have tried adding the new column using a combination of mutate and matrix, but this returns an error. I've also tried using do instead of mutate and this seems like a step in the right direction, but I know this is also incorrect, because the dimensions of the elements are off, and there is only one row in the output.

library(tidyverse)

mtcars %>% 
  mutate(mat = matrix(c(.$disp, .$hp, .$gear, .$carb)))
#> Error: Column `mat` must be length 32 (the number of rows) or one, not 128

mtcars %>% 
  do(mat = matrix(c(.$disp, .$hp, .$gear, .$carb)))
#> # A tibble: 1 x 1
#>   mat            
#>   <list>         
#> 1 <dbl [128 x 1]>

Created on 2019-06-05 by the reprex package (v0.2.1)

I am expecting 32 rows in my output, and the mat column to contain 32 2x2 matrices composed of entries from mtcars$disp, mtcars$hp, mtcars$gear, and mtcars$carb.

My intent is to use map to pass each entry in the mat column as an argument to fisher.test, then extract the odds ratio estimate, and the p-value. But the main focus, of course, is creation of the list of matrices.

2

2 Answers

2
votes

You have two issues:

  • To store a matrix in a data.frame (tibble), you simply have to put it in a list.
  • To create 2 x 2 matrices (instead of repeating the same 4 x 32 matrix in each cell), you need to work row by row. Currently, when you do matrix(c(disp, hp, gear, carb)) you create a 4 x 32 matrix! You want only 4 x 1 inputs, reshaped to 2 x 2.

Working with pmap allows you to process the rows one by one, but alternatively you can use rowwise which groups by row:

library(tidyverse)
df <- 
  mtcars %>% 
    as_tibble() %>%
    rowwise() %>%
    mutate(mat = list(matrix(c(disp, hp, gear, carb), 2, 2)))

Edit: Now how do you actually use those? Let's take the example of a fisher.test. Note that a test is a complex object, with components (like p.value) and attributes, so we'll have to store them in a list-column.

You can either keep working rowwise, in which case the list is automagically "unlist-ed":

df %>%
  # keep in mind df is still grouped by row so 'mat' is only one matrix.
  # A test is a complex object so we need to store it in a list-column
  mutate(test = list(fisher.test(mat)), 
         # test is just one test so we can extract p-value directly 
         pval = test$p.value)

Or if you stop working row by row (for which you simply need to ungroup), then mat is a list of matrices onto which you can map functions. We use the map functions from purrr.

library("purrr")

df %>%
  ungroup() %>%
  # Apply the test to each mat using `map` from `purrr` 
  # `map` returns a list so `test` is a list-column
  mutate(test = map(mat, fisher.test), 
         # Now `test` is a list of tests... so you need to map operations onto it 
         # Extract the p-values from each test, into a numeric column rather than a list-column
         pval = map_dbl(test, pluck, "p.value"))

Which one you prefer is a matter of taste :)

2
votes

you can use the pmap function from the purrr package inside mutate:

library(tidyverse)
mtcars %>% as_tibble() %>% 
  mutate(mat = pmap(list(disp, hp, gear, carb), ~matrix(c(..1, ..2, ..3, ..4), 2, 2)))

# A tibble: 32 x 12
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb mat              
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <list>           
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4 <dbl[,2] [2 x 2]>
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4 <dbl[,2] [2 x 2]>

Each entry of mat is then a 2x2 matrix with the desired elements. Hope this helps.