1
votes

I am conducting a principal component analysis in R on vectors with missing data. I want to extract the score from the principal component and match the values with the observations that are not missing in the original frame but I can't figure out how to extract and match on the right identifiers. For example:

x1 <- c(1,2,3,NA, 5,6,7)
x2 <- c(7,NA,6,NA, 4,3,2)

frame <- cbind(x1,x2)

pca_ob<- princomp(~frame)
pca_ob$score[,1]

This produces the following output:

    1         3         5         6         7 
  4.273146  2.104705 -0.715732 -2.125950 -3.536168 

I would like to bind pca_ob$score[,1] with the original frame based on the identifiers and fill the rest in with NAs such that it produces the following matrix:

    x1 x2 x3
1    1  7  4.273146
2    2  NA NA
3    3  6  2.104705
4    NA NA NA
5    5  4  -0.715732
6    6  3  -2.125950
7    7  2  -3.536168

This takes the output of the first set of scores and matches them back to the frame with NAs filling all spots where there isn't a pca score and matching on the variables for which there are scores.Any thoughts? Thanks.

2
I can see you are trying but it still isn't very clear what you are asking. Perhaps if you provided a short sample input and the output you want to achieve. Usually the R PCA routines will give you back a rotation matrix, scales, and means that can be used to "go backwards" from the PCs back to the data, or from new data to PCs.Paul
Hi Paul - I just edited it, so hopefully this more clearly explains the questions. Thankscoding_heart
I did goodFrame<-na.omit(frame) and obtain the same pca scores from goodFrame, so R is dropping your missing data completely for the purpose of calculating PCA.Paul
Indeed, that is what's happening, which is not a problem. I just want to rebind the vector of PCA scores to the vectors with NAs, matching them on their identifier. In the output above, you see that the PCA produces values for elements 1,3,5,6,7. The question I have is how to match those to elements 1,3,5,6,7 in the other vector and introduce NAs for elements 2 and 4.coding_heart
Unfortunately, what should be really basic data manipulation in R is often a maze of twisty little passagesPaul

2 Answers

2
votes

This feels like a bit of a hack. There may be a better solution out there somewhere.

The method here is to create a new object that is initially full of NAs, and then turn the names of the sparse data into numeric indexes and assign using those.

> p1 <- pca_ob$scores[,1]
> p1
        1         3         5         6         7 
 4.273146  2.104705 -0.715732 -2.125950 -3.536168 
> z<-rep(NA, 7)
> z[as.numeric(names(p1))]<-p1
> z
[1]  4.273146        NA  2.104705        NA -0.715732 -2.125950 -3.536168
1
votes

I think you're looking for na.exclude:

> princomp(~frame, na.action = na.exclude)$scores
     Comp.1      Comp.2
1  4.273146  0.24540178
2        NA          NA
3  2.104705 -0.30036459
4        NA          NA
5 -0.715732 -0.08790757
6 -2.125950  0.01832094
7 -3.536168  0.12454944

I found this in the help page for na.omit (which covers the other NA actions as well), which is linked from princomp's na.action argument description.