List-columns in tibbles: Can I link a list-column with another list-column?

Question

This is my first post, so please excuse me if I sound silly or the answer I am looking for already exists.

My main problem is this: I have created a tibble containing 4 columns (a character column, two data columns and a column containing a distance matrix for each of the levels of the character column) and I am trying to create a function that uses the distance matrices from the 4th column as a dependent variable and some independent variables from the second column. The problem is that R keeps warning me that it cannot find the dependent variable.

The packages I've used are the following:

library(easypackages)
libraries('tidyverse', 'broom')

The tibble containing my IVs looks like this:

IVs_tibble
 # A tibble: 175 × 8
     Site Region  IV.1  IV.2  IV.3  IV.4  IV.5  IV.6
    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1   Site.1    A   387   169   460   234   137   445
2   Site.2    A   197   172   449   192   141   422
3   Site.3    A    86   179   432    78   147   398
4   Site.4    A    14   183   404     4   152   375
5   Site.5    B    86   179   407    80   148   382
6   Site.6    B    18   175   422   154   146   397
7   Site.7    C   132   172   429   211   142   413
8   Site.8    C    99   178   404   120   147   385
9   Site.9    D    73   177   409   150   146   382
10  Site.10   D    77   175   417   182   145   383
# ... with 165 more rows

I then nest it:

by_region <- IVs_tibble %>% group_by(Region) %>% nest()

And here's how it looks:

 by_region
# A tibble: 6 × 2
  Region        data
   <chr>        <list>
1  A         <tibble [60]>
2  B         <tibble [84]>
3  C         <tibble [10]>
4  D         <tibble [6]>
5  E         <tibble [13]>
6  F         <tibble [2]>

Subsequently, I create another tibble containing raw presence/absence data:

regions
# A tibble: 175 × 984
   Region   Site    Taxon.1   Taxon.2    Taxon.3
    <chr>   <chr>   <dbl>     <dbl>      <dbl>
1 A         Site.1   1         1          0
2 A         Site.1   0         1          0
3 B         Site.1   1         1          1
4  B        Site.1   0         0          0
5 C         Site.1   1         0          1
6 C         Site.1   0         0          1
7 D         Site.1   1         0          0
8 D         Site.1   1         1          0
9 D         Site.1   0         0          0
10 F        Site.10  0         1          0
# ... with 165 more rows, and 982 more variables: (these contain taxa names)

Then I nest that tibble too:

rg <- regions %>% group_by(Region) %>% nest()

And it looks like:

 rg
# A tibble: 6 × 2
  Region        IVs
   <chr>        <list>
1  A         <tibble [60]>
2  B         <tibble [84]>
3  C         <tibble [10]>
4  D         <tibble [6]>
5  E         <tibble [13]>
6  F         <tibble [2]>

And I rename the data column in order to join it with the tibble containing the IVs:

rr <- rg %>% rename(Communities = data)
rr
# A tibble: 6 × 2
  Region        Communities
   <chr>        <list>
1  A         <tibble [60]>
2  B         <tibble [84]>
3  C         <tibble [10]>
4  D         <tibble [6]>
5  E         <tibble [13]>
6  F         <tibble [2]>

As a following step, I construct a function to compute the matrices:

betamatrices <-function(df){vegan::betadiver(df, method='sim')}
rr <- rr %>% mutate(model = map(data,betamatrices))

The rr tibble now looks like this:

rr
    # A tibble: 6 × 3
  Region          Communities        Dist.matrix
   <chr>           <list>             <list>
1     A            <tibble [60]>      <S3: dist>
2     B            <tibble [84]>      <S3: dist>
3     C            <tibble [10]>      <S3: dist>
4     D            <tibble [6]>       <S3: dist>
5     E            <tibble [13]>      <S3: dist>
6     F            <tibble [2]>       <S3: dist>

And then, I join the two tibbles:

my_tibble <- by_region %>% left_join(rr)

The tibble looks like this:

my_tibble
# A tibble: 6 × 4
 Region        IVs               Communities        Dist.matrix
   <chr>       <list>            <list>             <list>
1  A         <tibble [60]>       <tibble [60]>       <S3: dist>
2  B         <tibble [84]>       <tibble [84]>       <S3: dist>
3  C         <tibble [10]>       <tibble [10]>       <S3: dist>
4  D         <tibble [6]>        <tibble [6]>        <S3: dist>
5  E         <tibble [13]>       <tibble [13]>       <S3: dist>
6  F         <tibble [2]>        <tibble [2]>        <S3: dist>

And the function I want to apply looks like this:

mrm_model <- function(df){ecodist::MRM(Dist.matrix~dist(IV.1) + dist(IV.2),data = (df))}

When I try to compute it with the following code:

my_tibble <- my_tibble %>% mutate(mrm = map(IVs,mrm_model)),

I get this error message:

Error in mutate_impl(.data, dots) : object 'Dist.matrix' not found.

Do you have any idea why this keeps popping up?

When I try to "correct" the function with the $ sign:

mrm_model <- function(df){ecodist::MRM(my_tibble$Dist.matrix~dist(Area),data = (df))},

I get the following warning:

Error in mutate_impl(.data, dots) : invalid type (list) for variable 'my_tibble$Dist.matrix'.

I am an absolute newbie in this type of data-manipulation, so obviously I am over my head and I would greatly appreciate all the help I can get.

Welcome to SO. Please hover over the R tag - it asks for a minimal reproducible example. Here's a guide. I suggest editing your question accordingly. A good one usually provides minimal input data (my_tibble), the desired output data, code tries incl required packages - all copy-paste-run'able in a new/clean R session. Why? It makes it easier for all to follow and participate without guesswork. And you increase the chances of getting helpful comments & answers. :) — lukeA
I am terribly sorry for the inadequate structure of my question! I'll add the packages I've used, as well a subset of my data right away! Thank you for the heads up! — Kostas_k84
No worries, we all started "small" on SO. However, it still takes everyone ~10 mins to reproduce all your tibbles. You could write IVs_tibble <- read.table(header=T, stringsAsFactors=F, text="Site Region IV.1 IV.2 IV.3 IV.4 IV.5 IV.6\n1 Site.1 A 387 169 460 234 137 445") or provide the result of dput(IVs_tibble) of dput(head(IVs_tibble)) followed by -> IVs_tibble to make things convenient (i.e. copy-paste-run'able). In general, it's best to keep things minimal - why not just provide rr using dput(rr)? — lukeA
Following your tibbles, I cannot reproduce your error. mutate with broom gives me "Error: not compatible with STRSXP". Without broom "Error: wrong result size (2), expected 5 or 1" + warnings. — lukeA

Kostas_k84 Kostas_k84 · Accepted Answer · 2017-10-10T04:50:37

I figured out that the problem can be solved if the tibble contains BOTH the presence/absence data and the IVs. Anyway, thanks for the interest lukeA

List-columns in tibbles: Can I link a list-column with another list-column?

1 Answers