I am trying to use data.table, lapply and a function call to run multiple regressions against the same variable. I would like to get a simple table as output showing each variable and the coefficient of determination for each.
I am using Rstudio 1.2.1335, data.table 1.12.2 The data set I am using is "http://users.stat.ufl.edu/~rrandles/sta4210/Rclassnotes/data/textdatasets/KutnerData/Appendix%20C%20Data%20Sets/APPENC02.txt"
cnames<-c("ID","County","State","Area","Pop","Young","Old","Phys","Beds","Crime","HighSchool","BA","Poverty","Unemploy","PerCapitaIncome","TotalIncome","Region")
df62<-fread("APPENC02.txt", col.names=cnames)
df62[,c("ID", "County","State","Region"):=NULL]
variability<-function(y){
model<-eval(substitute(lm(Phys~y, data=df62)))
anova<-anova(model)
SSR<- anova$`Sum Sq`[1]
SSE<- anova$`Sum Sq`[2]
SSTO<-SSR+SSE
R2<-SSR/SSTO
return(R2)
}
df62[ , lapply(.SD, variability)]
This works if the last line is:
df62[ , lapply(.SD, Variability), by=Phys]
Error Message when I omit the 'by' clause: "Error in (function(x, i, exact) if (is.matrix(i)) as.matrix(x)[[i]] else .subset2(x, : object 'i' not found"
If I group by the variable 'Phys', I get correct results, but I have each result needlessly repeated.
eval(substitute())? - Roman Luštrik