I am using the simper
function from the vegan
package. Briefly, simper
compares a a set of groups and calculates what variables are contributing most to their dissimilarity, and also by how much, in a column named cusum
that gives the cumulative contribution. The output is a nested list of each between-group contrast and their results. eg.
library(vegan)
library(data.table)
library(tidyr)
data(dune)
data(dune.env)
sim <- with(dune.env, simper(dune, Management))
simsum<-summary(sim)
#(short version of output)
$SF_BF
average sd ratio ava avb cumsum
Agrostol 0.061373875 0.034193273 1.7949108 4.6666667 0.0000000 0.09824271
Alopgeni 0.052667124 0.036475863 1.4438897 4.3333333 0.6666667 0.18254830
$SF_HF
average sd ratio ava avb cumsum
Agrostol 0.047380081 0.031272715 1.5150613 4.6666667 1.4 0.08350879
Alopgeni 0.046433015 0.032896891 1.4114712 4.3333333 1.6 0.16534834
$SF_NM
average sd ratio ava avb cumsum
Poatriv 0.078284148 0.040947182 1.9118324 4.6666667 0.0000000 0.1013601
Alopgeni 0.071219425 0.046958337 1.5166513 4.3333333 0.0000000 0.1935731
From this, I am interested in 1) the names of each nested list (i.e. which groups are being contrasted), 2) the rownames (i.e. which variables are contributing to the dissimilarity), and 3) the cusum column (i.e. how much are they contributing).
I would like to turn this into a contrast matrix showing the top 3 contributing variables to each between-group contrast so that it is easier to read and doesn't take up so much room. Here is an example that I made in excel:
I suspect this is going to be tricky, but this is what I have so far:
top3<-lapply(simsum, `[`,1:3,)#get top 3 contributors
cuss<-lapply(top3, `[`,6)#get last column
rows<-lapply(top3, rownames)#get names from list
rows2<-lapply(cuss, cumsum)#get values from list
rowsdf<-do.call(rbind, lapply(rows, data.frame, stringsAsFactors=FALSE))#names into df
cusumdf<-do.call(rbind, lapply(rows2, data.frame, stringsAsFactors=FALSE))#values into df
simperdf<-cbind(rowsdf,cusumdf) #combine into one df
colnames(simperdf)<-c('name','cusum') #change colnames
setDT(simperdf, keep.rownames = TRUE)[]#convert rownames to a column
simperdf<-separate(data = simperdf, col = rn, into = c("left", "right"), sep = "\\_")#seperate contrasts names
simperdf<-separate(data = simperdf, col = right, into = c("right", "delete"), sep = "\\.")#separate numbers
simperdf$delete<-NULL#delete number column
Which gives this neat little dataframe:
left right name cusum
1: SF BF Agrostol 0.09824271
2: SF BF Alopgeni 0.28079100
3: SF BF Lolipere 0.54036058
4: SF HF Agrostol 0.08350879
5: SF HF Alopgeni 0.24885713
6: SF HF Lolipere 0.48820643
7: SF NM Poatriv 0.10136013
8: SF NM Alopgeni 0.29493318
9: SF NM Agrostol 0.56167145
10: BF HF Rumeacet 0.08163219
11: BF HF Poatriv 0.23357016
12: BF HF Planlanc 0.45275349
13: BF NM Lolipere 0.12427183
14: BF NM Poatriv 0.32348443
15: BF NM Poaprat 0.59466001
16: HF NM Poatriv 0.09913221
17: HF NM Lolipere 0.27381681
18: HF NM Rumeacet 0.51298871
But I'm not sure where to go from here. I see that contrasts(dune.env$Management)
would give the framework of the matrix:
HF NM SF
BF 0 0 0
HF 1 0 0
NM 0 1 0
SF 0 0 1
But I'm not sure how to manually fill it. Any help would be greatly appreciated.
xtabs(cusum ~ left + right + name, df)
...but it's pretty sparse. – alistaire