1
votes

I'm using factanal in R to reduce a 30 variable dataset down to 7 factors, then using the factor scores outputted by this process (from fa$scores) in an lm model. So far, so straightforward....

However, the independent variables I'm using are lagged one period vs my dependent (as the model is hopefully going to predict the future). I now have all 30 input variables I need to predict the value of next periods dependent var, so my question is this. How do I use the factanal output from the work I've already done to calculate the 7 factor scores from these 30 new variables? Once I have these, I can use the lm model to predict the next period.

Example of the code I'm using below (target var is in the first column of mydata):

#extract factors
fitted_data <- factanal(mydata[,-1],7,rotation="varimax",lower=0.05,scores="regression")

#add factor scores back to main dataset
mydata  <- cbind(mydata,fitted_data$scores)

'#inear regression model to predict my target_variable using factors I've extracted
mod1 <- lm(Target_Var ~ Factor1+ Factor2 + Factor3 + Factor4 + Factor5 + Factor6 + Factor7,data=mydata) 

I have the latest 30 independent variables in a dataset called "new_data", and I'm just looking to calculate the 7 factor scores using the factor loadings already calculated, but can't for the life of me figure out how.....

Any help greatly appreciated.

1
Hey, are you able to share something regarding your data, please? e.g. dput(head(mydata)) & dim(mydata) would be useful for replicationJonny Phelps

1 Answers

2
votes

Solution is here: https://stat.ethz.ch/pipermail/r-help/2002-April/020278.html

I tested it out below, seems to work ok :)

# variables, factors, dimension of data
vars <- 5
f <- 2
N <- 10

# function from https://stat.ethz.ch/pipermail/r-help/2002-April/020278.html
newFactors <- function(model_data, new_data, fitted_data){
  coef <- solve(fitted_data$correlation) %*% fitted_data$loadings
  means <- apply(model_data, 2, mean)
  sds <- apply(model_data, 2, sd)
  scale(new_data, means, sds) %*% coef
}

# sample data
mydata <- as.data.frame(do.call(cbind, lapply(1:vars, function(i){
  runif(N)
})))
target_data <- data.frame(y = runif(N))

# extract factors
fitted_data <- factanal(mydata,f,rotation="varimax",lower=0.05,scores="regression")
factor_data <- fitted_data$scores
# check scores with new function
check <- newFactors(mydata, mydata, fitted_data)
max(abs(check-factor_data)) # float issue

# new data sample
N2 <- 3
new_data <-  as.data.frame(do.call(cbind, lapply(1:vars, function(i){
  runif(N2)
})))

# the factor loadings for new data
new_factor_data <- newFactors(mydata, new_data, fitted_data)