1
votes

I have a data with 27 variable (columns) and 125 samples (rows). I am trying to do some PCA anlaysis using this script:

library("FactoMineR")
library("factoextra")
library(ggplot2)
mydata<- read.csv("Overall.csv", TRUE, ",")
mydata[is.na(mydata)]=0
attach(mydata)
X=cbind (Adj..1_12, Adj..1_13, Adj..1_2, Adj..10_11, Adj..11_12, Adj..12_13, Adj..2_11, Adj..2_12, Adj..2_13,
         Adj..2_3, Adj..3_11, Adj..3_12, Adj..3_4, Adj..4_11, Adj..4_5, Adj..5_10, Adj..5_11, Adj..5_6,
         Adj..6_10, Adj..6_11, Adj..6_7, Adj..6_9, Adj..7_10, Adj..7_8, Adj..7_9, Adj..8_9, Adj..9_10)
summary(X)
cor(X)
res.pca <- princomp(X, scores=TRUE, cor=TRUE)
summary(res.pca)


fviz_pca_var(res.pca, col.var="contrib",
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             repel = TRUE, # Avoid text overlapping
             axes = c(1, 2) # choose PCs to plot
)

But I get different results from the one provided by Rcmdr and SPSS!!! The first (PC1) and second (PC2) principal components explained up to 81.94% and 4.9% respectively by the PCA analysis done with SPSS and Rcmdr package. Whereas using the above-given script provided a PCA plot, which explained up to 46.7% and 5.8% of the data!! Could it be due to a higher number of samples (125 rows)??

But the same script correctly estimates the PCA for other data (with fewer rows number) similar to the one provided by SPSS. Where is the mistake??

Many thank

Best regards,

Umar

Access to the raw data

Results provided by Rcmdr and SPSS

Results provided by princomp() and prcomp()

1
maybe related - have you scaled your data? stackoverflow.com/a/60759060/7941188tjebo
Hi @Tjebo yes I did. I want a result similar to SPSS through princomp method.Umar KHAN

1 Answers

0
votes

Finally, I found the mistake. Actually, I removed outliers from the data so there are missing data cells which are replaced by mean values in SPSS, and Rcmdr (of R) whereas the script I used (provided above) replaced these missing cells/value with zero which resulted in entirely different PCA estimation.