1
votes

I'm working on a survey with 288 observation in total (108 complete answers used) and around 200 variables. I'm working on reducing those number using Principal Components Analysis, using R.

Suppose that 3 items (loaded into a sub-dataset called tmtformalizM) should be reduced, theoretically,into one component (from literature review), 7-points Likert scale. This is the extraction of a PCA made on the correllation matrix, combined with an orthogonal rotation (varimax):

Principal Components Analysis
Call: principal(r = tmtformalizM, nfactors = 2, rotate = "varimax", 
scores = T)
Standardized loadings (pattern matrix) based upon correlation matrix
                  RC1   RC2   h2   u2
invapproccio     0.89 -0.11 0.81 0.19
invformacomunic  0.60  0.53 0.64 0.36
verbali         -0.07  0.91 0.84 0.16

                       RC1  RC2
SS loadings           1.16 1.12
Proportion Var        0.39 0.37
Cumulative Var        0.39 0.76
Proportion Explained  0.51 0.49
Cumulative Proportion 0.51 1.00

Test of the hypothesis that 2 components are sufficient.

The degrees of freedom for the null model are  3  and the objective function      was  0.09
The degrees of freedom for the model are -2  and the objective function was     0.74 
The total number of observations was  108  with MLE Chi Square =  77.17  with prob <  NA 

Fit based upon off diagonal values = -0.86

The extraction shows 2 components (and a terrible fit, how is it possible that is negative?). The Cronbach's alpha of the first PCA, that has the first two items, is very low (0.35).

My question is: in this case I need to drop the first component identified by the analysis, but should I keep as a final variable the scores of the item 3 (after PCA) or the original survey values of item 3?

Also, consider the case of a PCA where 2 components (with 3 items each) are extracted and the first component presents very low reliability (the second component presents an Alpha > 0.8).

In this case I need to re-execute the PCA only on the items identified by the second component and take these scores as a final variable or just keep the scores of the second component identified by the first PCA?

Thanks

1
This is more a statistical problem than a programming issue, isn't it?user3710546
Yes you are right, the only programming issue that I found is how to actually delete the low-reliability components... but indeed you are right, is mainly theoreticalNiccolo
I agree this is not easy, as you are using R, but questions are mainly statistical.user3710546

1 Answers

0
votes

If you think that the three items should be formed into one component, why are you extracting 2? The understand the problem with fit, look at the residuals of the solution. library(psych) pc1 <- principal(tmtformalizM) resid(pc1)

I do not understand your second question.