I've used the function betadisper()
in the vegan package to generate multivariate dispersions and plot those data in a PCoA. In this example I'll be looking at the difference between the sexes in a singular species.
Load the original data. For our purposes this can legit be anything here. The data I'm using isn't special. Its feature measurements are from a bioacoustic dataset. I am walking through my process:
my_original_data = read.csv("mydata.csv", as.is = T, check.names = F)
#Just extract the numeric/quantitative data.
myData=my_original_data[, 13:107]
Based on previous research, we used an unsupervised randomForest to determine similarity within our original feature measurements:
require(randomForest)
full_urf = randomForest(myData, proximity=T, scale=TRUE, ntree=4999,importance = TRUE)
A index was then generated using the proximity matrix:
urf_dist_full = as.dist(1-full_urf$proximity)
An permutational MANOVA was run on the resulting index using the vegan package. The use of the pMANOVA was well researched and is the correct test for my purposes:
mod=adonis(formula = urf_dist_full ~ Sex * Age * Variant, data = my_original_data, permutations = 999, method = "euclidean")
my_original_data had qualitative factors, Sex, Age and Variant. I could have extracted them, but it seemed cleaner to keep them within the original dataset.
After running a few homogeneity tests, I want to plot the multivariate dispersions. To do this I have been using the betadisper function:
Sex=betadisper(urf_dist_full,my_original_data$Sex)
plot(Sex, main="Sex Multivariate Dispersions")
That plots this beauty:
How can I label the centroids as Male and Female? I also want to run this plot for the Variant category, but that has five factors rather than two, which really warrants labeling.
I've seen the boxplot()
variant of this, but I like how the PCoA also shows clustering.