1
votes

Say I am fitting a GLM, and I wanted to specify the feature terms using a variable that contains a list of column names that I want to fit. How do I do that?

I tried

using GLM
vars = ["b", "b", "c"]
glm(@formula(label~terms.(vars)), data = df)

But it doesn't work. I want to do


glm(@formula(label~a+b+b), data=df)

but of course the a b and c is specific to this data set. I need one that can take in a set of terms in string vector format and create the right formula for it.

1

1 Answers

2
votes

See here for constructing programatically.

If you have a vector of strings, you can do

vars = ["a", "b", "c"] .|> Symbol # Or just start with symbols...
f = Term(:label) ~ sum(vars)
glm(f, df, Binomial()) # or whatever you want

The construction yields

julia> f = Term(:label)~sum(Term.(vars))
FormulaTerm
Response:
  label(unknown)
Predictors:
  a(unknown)
  b(unknown)
  c(unknown)

Which gives

julia> glm(f, df, Binomial())
StatsModels.TableRegressionModel{...

label ~ 1 + a + b + c

Coefficients:
─────────────────────────────────────────────────────────────────────────
                 Coef.  Std. Error      z  Pr(>|z|)  Lower 95%  Upper 95%
─────────────────────────────────────────────────────────────────────────
(Intercept)   1.73573      2.6848    0.65    0.5180   -3.52637    6.99784
a             0.777821     3.78865   0.21    0.8373   -6.6478     8.20344
b            -0.874216     2.48471  -0.35    0.7250   -5.74415    3.99572
c            -2.20088      2.19081  -1.00    0.3151   -6.49478    2.09303
─────────────────────────────────────────────────────────────────────────

Note: I just got random garbage data so don't pay attention to table contents.