Ok, I'm working on a silly toy problem in R (part of an edx course actually), running a bunch of bivariate logits and look at the p values. And I'm trying to add some coding practice to my data crunching practice by doing the chore as a for loop rather than as a bunch of individual models. So I pulled the variable names I wanted out of the data frame, stuck that in a vector, and passed that vector to glm() with a for loop.
After about an hour and a half of searching and hacking around to deal with the inevitable variable length errors, I realized that R was interpreting the elements of the variable vector as character strings rather than variable names. Solved that problem, ended up with a final working loop as follows:
for (i in 1:length(dumber)) {
print(summary(glm(WorldSeries ~ get(dumber[i]) , data=baseball, family=binomial)))
}
where dumber is the vector of independent variable names, WorldSeries is the dependent variable.
And that was awesome... except for one little problem. The console output is a bunch of model summaries, which is what I want, but the summaries aren't labelled with the variable names. Instead, they're just labelled with the code from the for loop! For example, here are the summaries for two of the variables my little loop went through:
Call: glm(formula = WorldSeries ~ get(dumber[i]), family = binomial, data = baseball) Deviance Residuals: Min 1Q Median 3Q Max -0.5610 -0.5209 -0.5088 -0.4902 2.1268 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.08725 6.07285 -0.014 0.989 get(dumber[i]) -4.65992 15.06881 -0.309 0.757 (Dispersion parameter for binomial family taken to be 1) Null deviance: 84.926 on 113 degrees of freedom Residual deviance: 84.830 on 112 degrees of freedom (130 observations deleted due to missingness) AIC: 88.83 Number of Fisher Scoring iterations: 4 Call: glm(formula = WorldSeries ~ get(dumber[i]), family = binomial, data = baseball) Deviance Residuals: Min 1Q Median 3Q Max -0.9871 -0.8017 -0.5089 -0.5089 2.2643 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.03868 0.43750 0.088 0.929559 get(dumber[i]) -0.25220 0.07422 -3.398 0.000678 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 239.12 on 243 degrees of freedom Residual deviance: 226.96 on 242 degrees of freedom AIC: 230.96 Number of Fisher Scoring iterations: 4
That's obviously hopeless, especially as the number of elements of the variable vector increases. I'm sure if I knew a lot more about object-oriented programming than I do, I'd be able to just create some kind of complicated object that has the elements of dumber matched to the model summaries, or directly tinker with the summaries to insert the elements of dumber into where it currently just reads "get(dumber[i])". But I currently know jack-all about OOP (I'm learning! It's slow!). So does anyone wanna clue me in? Thanks!