4
votes

I have a dataframe in Julia with less than 10 column names. I want to generate a list of all possible formulas that could be fed into a linear model (eg, [Y~X1+X2+X3, Y~X1+X2, ....]). I can accomplish this easily with combinations() and string versions of the column names. However, when I try to convert the strings into Formula objects, it breaks down. Looking at DataFrames.jl documentation, it seems like one can only construct Formulas from "expressions" and I can indeed make a list of individual column names as expressions. Is there any way I can somehow join together a bunch of different expressions using the "+" operator programmatically such that the resulting composite expression can then be passed into RHS of the Formula constructor? My impulse is to search for some function that will convert an arbitrary string into the equivalent expression, but not sure if that is correct.

2

2 Answers

5
votes

The function parse takes a string, parses it, and returns an expression. I see nothing wrong with using it for what you're talking about.

1
votes

Here is some actual working code, because I have been struggling with getting a similar problem to work. Please note this is Julia version 1.3.1 so parse is now Meta.parse and instead of combinations I used IterTools.subsets.

using RDatasets, DataFrames, IterTools, GLM
airquality = rename(dataset("datasets", "airquality"), "Solar.R" => "Solar_R")
predictors = setdiff(names(airquality), [:Temp])
for combination in subsets(predictors)
  formula = FormulaTerm(Term(:Temp), Tuple(Term.(combination)))
  if length(combination) > 0
    @show lm(formula, airquality)
  end
end