I first use grep
to obtain all variable names that begin with the preface: "h_." I then collapse that array into a single string, separated with plus signs. Is there a way to subsequently use this string in a linear regression?
For example:
holiday_array <- grep("h_", names(df), value=TRUE)
holiday_string = paste(holiday_array, collapse=' + ' )
r_3 <- lm(log(assaults) ~ year + month + holiday_string, data = df)
I get the straightforward error variable lengths differ (found for 'holiday_string')
I can do it like this, for example:
holiday_formula <- as.formula(paste('log(assaults) ~ attend_v + year+ month + ', paste("", holiday_vars, collapse='+')))
r_3 <- lm(holiday_formula, data = df)
But I don't want to have to type a separate formula construction for each new set of controls. I want to be able to add the "string" inside the lm function. Is this possible?
The above is problematic, because let's say I want to then add another set of control variables to the formula contained in holiday_formula
, so something like this:
weather_vars <- grep("w_", names(df), value=TRUE) weather_formula <- as.formula(paste(holiday_formula, paste("+", weather_vars, collapse='+')))
Not sure how you would do the above.
year + month + as.formula(holiday_string)
, but I can't. – Parseltongue