0
votes

Ok, I'm working on a silly toy problem in R (part of an edx course actually), running a bunch of bivariate logits and look at the p values. And I'm trying to add some coding practice to my data crunching practice by doing the chore as a for loop rather than as a bunch of individual models. So I pulled the variable names I wanted out of the data frame, stuck that in a vector, and passed that vector to glm() with a for loop.

After about an hour and a half of searching and hacking around to deal with the inevitable variable length errors, I realized that R was interpreting the elements of the variable vector as character strings rather than variable names. Solved that problem, ended up with a final working loop as follows:

for (i in 1:length(dumber)) { 
  print(summary(glm(WorldSeries ~ get(dumber[i]) , data=baseball, family=binomial)))
} 

where dumber is the vector of independent variable names, WorldSeries is the dependent variable.

And that was awesome... except for one little problem. The console output is a bunch of model summaries, which is what I want, but the summaries aren't labelled with the variable names. Instead, they're just labelled with the code from the for loop! For example, here are the summaries for two of the variables my little loop went through:

Call:
glm(formula = WorldSeries ~ get(dumber[i]), family = binomial, 
    data = baseball)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.5610  -0.5209  -0.5088  -0.4902   2.1268  

Coefficients:
               Estimate Std. Error z value Pr(>|z|)
(Intercept)    -0.08725    6.07285  -0.014    0.989
get(dumber[i]) -4.65992   15.06881  -0.309    0.757

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 84.926  on 113  degrees of freedom
Residual deviance: 84.830  on 112  degrees of freedom
  (130 observations deleted due to missingness)
AIC: 88.83

Number of Fisher Scoring iterations: 4


Call:
glm(formula = WorldSeries ~ get(dumber[i]), family = binomial, 
    data = baseball)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.9871  -0.8017  -0.5089  -0.5089   2.2643  

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)     0.03868    0.43750   0.088 0.929559    
get(dumber[i]) -0.25220    0.07422  -3.398 0.000678 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 239.12  on 243  degrees of freedom
Residual deviance: 226.96  on 242  degrees of freedom
AIC: 230.96

Number of Fisher Scoring iterations: 4

That's obviously hopeless, especially as the number of elements of the variable vector increases. I'm sure if I knew a lot more about object-oriented programming than I do, I'd be able to just create some kind of complicated object that has the elements of dumber matched to the model summaries, or directly tinker with the summaries to insert the elements of dumber into where it currently just reads "get(dumber[i])". But I currently know jack-all about OOP (I'm learning! It's slow!). So does anyone wanna clue me in? Thanks!

1

1 Answers

1
votes

You could do this (only send the outcome and predictor columns one at a time to glm):

for (i in 1:length(dumber)) { 
  print(summary(glm(WorldSeries ~ . , data=baseball[, c("WorldSeries",  dumber[i])], 
                                       family=binomial)))
} 

You could also do this (label the outputs with the value of 'dumber'):

for (i in 1:length(dumber)) { print( paste0("Current predictor is ...", dumber))
  print(summary(glm(WorldSeries ~ get(dumber) , data=baseball, family=binomial)))
} 

As you progress down the road to R mastery, you would probably want to build a list of summary objects and then use lapply to print or cat your tailored output.