I do have two sets of data: Abundance data and environmental data and need to "link" them or "overlay" them in a PCA:
I want to conduct a PCA in R which gives me the individuals Ciliate species as Individuals and the environmental parameters as variables. What I do have are two different dataframes. "abundance" where the abundance of the different species at the sampled sites are given and "environment" where the environmental parameters of the sampled sites are given. So I have one parameter in common in eacht dataframe: the sites! If i conduct PCAs I either get a plot with the sites as individuals and the environmental parameters as variables or a plot with the species as individuals and the sites as variables. What I need, would be to link the datasets by the parameter site so that I can conduct a PCA with the ciliate species as individuals ans the environmental parameters as variables. So i would somehow need to link the two PCAs/dataframes over the common parameter site. What I did so far- i did two different PCAs and rememberd the coordinates of the individuals (ciliate species) of PCA1 and the coordinates of the variables of PCA2 (environmental parameters) and biploted them-> The graph is exactly what I would need, but is it still interpretable as a PCA, so the dataframes really linked by the site parameter? Or did simply trick with the data and loose the interpretability?
Another option I tried is to calculate the environmental paramteres of each ciliate species by weighted mean (weighted by the abundance of the ciliates at the site) and conduct a PCA on a dataframe with ciliate species and the weighted mean of the environmental parameters...Which works but I think I loose a lot of information on this way...What do you think?
#Create random dataframe of abundance data, I am sure this can be done simpler and more elegant than this ;)
species<-c("spec1", "spec2", "spec3", "spec 4", "spec 5", "spec 6", "spec7")
site1<-c(2,4,19,34,3,6,9)
site2<-c(5,8,9,12,0,1,1)
site3<-c(23,56,7,1,1,1,2)
site4<-c(4,6,2,8,5,1,7)
abundance<-data.frame(species,site1,site2,site3,site4)
rownames(abundance)<-abundance$species
abundance<-abundance[,-1]
#Create random dataframe of abundance data
#environmental parameters of the sites
X<-c("site1","site2","site3","site4")
Temp<-c(24,24.5,23.5,25)
Chla<-c(2.2,1.5,2.0,3.4)
Plo<-c(1000,2000,1500,200)
Plo2<-c(200,400,600,200)
environment<-data.frame(X,Temp,Chla,Plo,Plo2)
rownames(environment)<-environment$X
environment<-environment[,-1]
###PCA on abundance data
#hellinger pre-transformation of abundance data
library(vegan)
abu.h<-decostand(abundance,"hellinger")
abu.h.pca<-prcomp(abu.h)
envir.pca<-prcomp(environment,scale=TRUE)
biplot(abu.h.pca)
##and now I would need to discard the sites vectors and overlay it with
#the environmental sites factors, due to my prof?
#Graph of individuals
fviz_pca_ind(abu.h.pca)
##get coordinates
library(factoextra)
ind<-get_pca_ind(abu.h.pca)
head(ind$coord)
#x in biplot
ind<-ind$coord
ind<-ind[,1:2]
ind
#y variables
# Extract the results for variables only
vari<-get_pca_var(abu.h.pca)
var<-vari$coord
var<-var[,1:2]
var
biplot(ind, var, var.axes = TRUE)