1
votes

I have tried to search this question on here but I couldn't find anything so sorry if this question has already been answered. My dataset consists of daily information for a large number of stocks (1000+) over a 10 year period. So I have read my dataset as a data frame time series where each column is a separate stock. I would like to regress each of the stock against month dummy variables capture the season variation and obtain the residuals. What I have done is the following:

for (i in 1:1000){
month.f<-factor(months(time(stockinfo[,i])))
dummy<-model.matrix(month.f)
residStock[,1]<-residuals(lm(stockinfo[,i]~dummy,na.action=na.exclude))
}
#Stockinfo is data.frame

Is this the correct way to do it?

Secondly, i would like to run a regression using the residuals as the the dependent variable and other independent variables from another data frame. What would be the best way to do this, would I have to use a for loop again?

Thank you a lot for your help.

1
Thank you so much, I couldn't find that thread before. My apologies. Would you mind telling me whether my method for dummy variables is correct or not? Thanks again.user2672759
Sorry, I can't help you right now. The only advice I can give is to provide some small test data that represents the most important properties of your real data. This will make people much more willing to help you. For example, one date column and two stock columns, 20-ish rows. A solution for such test data can most likely be scaled to your 1000+ columns.Henrik

1 Answers

0
votes

You can create a list of stocks as follows and then use Map function and can avoid R for loop (Not tested since you didn't provide the sample data)

Assume your data is mydata with month as 1,2, you use 11 months as dummy if there are 12 months

mystock<-list("APP~","INTEL~","MICROSOFT~") # stocks with tilde sign
myresi<-Map(function(x) resi(lm(as.formula(paste(x,paste(levels(as.factor(mydata$month))[-1],collapse="+"))),data=mydata),mystock) #-1 means we are using only 11 months excluding first as base month

Say your independent var is indep1,indep2, and indep3 and dependent is dep (And assuming that dep and indep are same for each stocks)

myestimate<-Map(function(x)lm(dep~indep1+indep2+indep3,data=x),myresi)