Manually build SIMPER contrast matrix from dataframe R

Question

I am using the simper function from the vegan package. Briefly, simper compares a a set of groups and calculates what variables are contributing most to their dissimilarity, and also by how much, in a column named cusum that gives the cumulative contribution. The output is a nested list of each between-group contrast and their results. eg.

library(vegan)
library(data.table)
library(tidyr)

data(dune)
data(dune.env)
sim <- with(dune.env, simper(dune, Management))
simsum<-summary(sim)

#(short version of output)

$SF_BF
             average          sd     ratio       ava       avb     cumsum
Agrostol 0.061373875 0.034193273 1.7949108 4.6666667 0.0000000 0.09824271
Alopgeni 0.052667124 0.036475863 1.4438897 4.3333333 0.6666667 0.18254830
$SF_HF
             average          sd     ratio       ava avb     cumsum
Agrostol 0.047380081 0.031272715 1.5150613 4.6666667 1.4 0.08350879
Alopgeni 0.046433015 0.032896891 1.4114712 4.3333333 1.6 0.16534834
$SF_NM
             average          sd     ratio       ava       avb    cumsum
Poatriv  0.078284148 0.040947182 1.9118324 4.6666667 0.0000000 0.1013601
Alopgeni 0.071219425 0.046958337 1.5166513 4.3333333 0.0000000 0.1935731

From this, I am interested in 1) the names of each nested list (i.e. which groups are being contrasted), 2) the rownames (i.e. which variables are contributing to the dissimilarity), and 3) the cusum column (i.e. how much are they contributing).

I would like to turn this into a contrast matrix showing the top 3 contributing variables to each between-group contrast so that it is easier to read and doesn't take up so much room. Here is an example that I made in excel:

I suspect this is going to be tricky, but this is what I have so far:

top3<-lapply(simsum, `[`,1:3,)#get top 3 contributors
cuss<-lapply(top3, `[`,6)#get last column

rows<-lapply(top3, rownames)#get names from list
rows2<-lapply(cuss, cumsum)#get values from list


rowsdf<-do.call(rbind, lapply(rows, data.frame, stringsAsFactors=FALSE))#names into df

cusumdf<-do.call(rbind, lapply(rows2, data.frame, stringsAsFactors=FALSE))#values into df

simperdf<-cbind(rowsdf,cusumdf) #combine into one df

colnames(simperdf)<-c('name','cusum') #change colnames

setDT(simperdf, keep.rownames = TRUE)[]#convert rownames to a column

simperdf<-separate(data = simperdf, col = rn, into = c("left", "right"), sep = "\\_")#seperate contrasts names
simperdf<-separate(data = simperdf, col = right, into = c("right", "delete"), sep = "\\.")#separate numbers
simperdf$delete<-NULL#delete number column

Which gives this neat little dataframe:

 left right     name      cusum
 1:   SF    BF Agrostol 0.09824271
 2:   SF    BF Alopgeni 0.28079100
 3:   SF    BF Lolipere 0.54036058
 4:   SF    HF Agrostol 0.08350879
 5:   SF    HF Alopgeni 0.24885713
 6:   SF    HF Lolipere 0.48820643
 7:   SF    NM  Poatriv 0.10136013
 8:   SF    NM Alopgeni 0.29493318
 9:   SF    NM Agrostol 0.56167145
10:   BF    HF Rumeacet 0.08163219
11:   BF    HF  Poatriv 0.23357016
12:   BF    HF Planlanc 0.45275349
13:   BF    NM Lolipere 0.12427183
14:   BF    NM  Poatriv 0.32348443
15:   BF    NM  Poaprat 0.59466001
16:   HF    NM  Poatriv 0.09913221
17:   HF    NM Lolipere 0.27381681
18:   HF    NM Rumeacet 0.51298871

But I'm not sure where to go from here. I see that contrasts(dune.env$Management) would give the framework of the matrix:

 HF NM SF
BF  0  0  0
HF  1  0  0
NM  0  1  0
SF  0  0  1

But I'm not sure how to manually fill it. Any help would be greatly appreciated.

You [mostly] can't do double headers in R, so I'm not sure it's possible to build what you want. You could make a nice array: xtabs(cusum ~ left + right + name, df) ...but it's pretty sparse. — alistaire

Tobias Dekker Tobias Dekker · Accepted Answer · 2017-04-25T08:11:06

It is not exactly what you are looking for but I think it is a way in the right direction:

require(tables)
test <- data.frame(left = c("SF", "SF", "BF", "BF"), 
                   right = c("BF","BF", "SF", "SF"),
                   name = c("Agrostol", "Alopgeni","Agrostol", "Alopgeni2"),
                   cumv = c(1,2,3,4))
tabular(right * name ~  left * cumv * mean, data = test)

Gives the output:

                 left     
                 BF   SF  
                 cumv cumv
 right name      mean mean
 BF    Agrostol  NaN    1 
       Alopgeni  NaN    2 
       Alopgeni2 NaN  NaN 
 SF    Agrostol    3  NaN 
       Alopgeni  NaN  NaN 
       Alopgeni2   4  NaN

Manually build SIMPER contrast matrix from dataframe R

1 Answers