How to use PCA model to predict scores on new data in Stata?

Question

My question is similar to R: using predict() on new data with high dimensionality but for Stata

I want to run a principal components model (pca) on one subset of data (the control group from an experiment) to extract the first component. Then I want to re-run the PCA model on a separate subset of data (the treatment group from an experiment) and get scores for those data as well. Essentially I want to use a pca model run on dataset_1 to predict scores in a new dataset_2.

In R, one would fit the model to the control group only, then one would use the "predict" command on the fitted model, with the full data set in the "new data" argument. This would generate predictions for all observations from a model fitted on the control group only. However, how does one do this in Stata?

global xlist2a std_agreedisagree1_1_a std_revagreedisagree1_2_a std_revagreedisagree1_3_a std_agreedisagree1_4_a std_revagreedisagree1_10_a std_revagreedisagree1_5_a 
pca $xlist2a
screeplot, yline(1)     
rotate, clear       
pca $xlist2a, com(3) 
rotate, varimax blanks (.30) 
predict pca5_p1b pca5_p2b pca5_p3b, score

Fixed code based on Nick's answer:

global xlist2a std_agreedisagree1_1_a std_revagreedisagree1_2_a std_revagreedisagree1_3_a std_agreedisagree1_4_a std_revagreedisagree1_10_a std_revagreedisagree1_5_a 
pca $xlist2a if zgroupa10==1 
screeplot, yline(1)     
rotate, clear       
pca $xlist2a if zgroupa10==1, com(3) 
rotate, varimax blanks (.30) 
predict pca5_p1b pca5_p2b pca5_p3b, score

Thanks for adding code, but the code above runs pca on all the observations for certain variables and then predict on all the observations. That's not what you should do, but your comment below my answer implies that your real code applied the approach needed. — Nick Cox

Nick Cox Nick Cox · Accepted Answer · 2016-11-22T17:43:01

What code did you try? The simplest of experiments shows that the same approach works in Stata too:

. sysuse auto, clear
(1978 Automobile Data)

. pca headroom trunk length displacement if foreign

Principal components/correlation                 Number of obs    =         22
                                                 Number of comp.  =          4
                                                 Trace            =          4
    Rotation: (unrotated = principal)            Rho              =     1.0000

    --------------------------------------------------------------------------
       Component |   Eigenvalue   Difference         Proportion   Cumulative
    -------------+------------------------------------------------------------
           Comp1 |      1.93666      .656823             0.4842       0.4842
           Comp2 |      1.27983      .615381             0.3200       0.8041
           Comp3 |      .664453      .545396             0.1661       0.9702
           Comp4 |      .119057            .             0.0298       1.0000
    --------------------------------------------------------------------------

Principal components (eigenvectors) 

    --------------------------------------------------------------------
        Variable |    Comp1     Comp2     Comp3     Comp4 | Unexplained 
    -------------+----------------------------------------+-------------
        headroom |   0.0288    0.7373    0.6749    0.0083 |           0 
           trunk |   0.2443    0.6496   -0.7199   -0.0090 |           0 
          length |   0.6849   -0.1313    0.1229   -0.7061 |           0 
    displacement |   0.6858   -0.1313    0.1054    0.7080 |           0 
    --------------------------------------------------------------------

. predict score1 score2 if !foreign
(score assumed)
(2 components skipped)

Scoring coefficients 
    sum of squares(column-loading) = 1

    ------------------------------------------------------
        Variable |    Comp1     Comp2     Comp3     Comp4 
    -------------+----------------------------------------
        headroom |   0.0288    0.7373    0.6749    0.0083 
           trunk |   0.2443    0.6496   -0.7199   -0.0090 
          length |   0.6849   -0.1313    0.1229   -0.7061 
    displacement |   0.6858   -0.1313    0.1054    0.7080 
    ------------------------------------------------------

.

How to use PCA model to predict scores on new data in Stata?

1 Answers