
I'm using factanal in R to reduce a 30 variable dataset down to 7 factors, then using the factor scores outputted by this process (from fa$scores) in an lm model. So far, so straightforward....

However, the independent variables I'm using are lagged one period vs my dependent (as the model is hopefully going to predict the future). I now have all 30 input variables I need to predict the value of next periods dependent var, so my question is this. How do I use the factanal output from the work I've already done to calculate the 7 factor scores from these 30 new variables? Once I have these, I can use the lm model to predict the next period.

Example of the code I'm using below (target var is in the first column of mydata):

#extract factors
fitted_data <- factanal(mydata[,-1],7,rotation="varimax",lower=0.05,scores="regression")

#add factor scores back to main dataset
mydata  <- cbind(mydata,fitted_data$scores)

'#inear regression model to predict my target_variable using factors I've extracted
mod1 <- lm(Target_Var ~ Factor1+ Factor2 + Factor3 + Factor4 + Factor5 + Factor6 + Factor7,data=mydata) 

I have the latest 30 independent variables in a dataset called "new_data", and I'm just looking to calculate the 7 factor scores using the factor loadings already calculated, but can't for the life of me figure out how.....

Any help greatly appreciated.

Hey, are you able to share something regarding your data, please? e.g. dput(head(mydata)) & dim(mydata) would be useful for replicationJonny Phelps

1 Answers


Solution is here: https://stat.ethz.ch/pipermail/r-help/2002-April/020278.html

I tested it out below, seems to work ok :)

# variables, factors, dimension of data
vars <- 5
f <- 2
N <- 10

# function from https://stat.ethz.ch/pipermail/r-help/2002-April/020278.html
newFactors <- function(model_data, new_data, fitted_data){
  coef <- solve(fitted_data$correlation) %*% fitted_data$loadings
  means <- apply(model_data, 2, mean)
  sds <- apply(model_data, 2, sd)
  scale(new_data, means, sds) %*% coef

# sample data
mydata <- as.data.frame(do.call(cbind, lapply(1:vars, function(i){
target_data <- data.frame(y = runif(N))

# extract factors
fitted_data <- factanal(mydata,f,rotation="varimax",lower=0.05,scores="regression")
factor_data <- fitted_data$scores
# check scores with new function
check <- newFactors(mydata, mydata, fitted_data)
max(abs(check-factor_data)) # float issue

# new data sample
N2 <- 3
new_data <-  as.data.frame(do.call(cbind, lapply(1:vars, function(i){

# the factor loadings for new data
new_factor_data <- newFactors(mydata, new_data, fitted_data)