Is it at all possible to use the lm() function with a matrix? Or maybe, the correct question is: "Is it possible to dynamically create formulas in R?"
I am creating a function whose output is a matrix and the number of columns in the matrix is not fixed = it depends on the inputs of the user. I want to fit an OLS model using the data in the matrix. - The first column represents the dependent variable - The other columns are the independent variables.
Using the lm
function requires a formula, which presupposes the knowledge of the number of explanatory variables, which is not my case!
Is there any solution other than estimating the equation manually with the OLS formula?
Reproducible example:
> # When user 1 uses the function, he obtains m1
> m1 <- replicate(5, rnorm(50))
> colnames(m1) <- c("dep", paste0("ind", 1:(ncol(m1)-1)))
> head(m1)
dep ind1 ind2 ind3 ind4
[1,] 0.5848705 0.3602760 -0.95493403 -1.7278030 -0.1914170
[2,] 1.7167604 -0.1035825 0.31026183 -1.5071415 -1.2748600
[3,] -0.1326187 -0.5669026 0.01819749 0.8346880 -0.6304498
[4,] -0.7381232 0.4612792 -0.36132404 -0.1183131 -0.7446985
[5,] 0.9919123 -1.3228248 -0.44728270 0.6571244 -0.4895385
[6,] -0.8010111 0.8307584 -0.16106804 0.3069870 -0.3834583
>
> # When user 2 uses the function, he obtains m2
> m2 <- replicate(6, rnorm(50))
> colnames(m2) <- c("dep", paste0("ind", 1:(ncol(m2)-1)))
> head(m2)
dep ind1 ind2 ind3 ind4 ind5
[1,] 1.2936031 -0.8060085 0.5020699 -1.699123234 1.0205626 1.0787888
[2,] 1.2357370 0.5973699 -1.2134283 -0.928040354 -0.3037920 -0.1251678
[3,] 0.5292583 0.1063213 -1.3036526 0.395886937 -0.1280863 1.1423532
[4,] 0.9234484 -0.4505604 1.2796922 0.424705893 -0.5547274 -0.3794037
[5,] -0.8016376 1.1362677 -1.1935238 -0.004460092 -1.4449704 -0.3739311
[6,] 0.4385867 0.5671138 0.4493617 -2.277925642 -0.8626944 -0.6880523
User 1 will estimate the linear model with:
lm(dep ~ ind1 + ind2 + ind3 + ind4, data = m1)
Meanwhile user 2 has an extra independent variable and will estimate the linear model in the following way:
lm(dep ~ ind1 + ind2 + ind3 + ind4 + ind5, data = m1)
Once again, is there any way I can create the formula dynamically?
lm(dep ~ ., data =m1)
– Khashaadep ~ .
is bad style because it will pick up any extra or derived columns you create, possibly causing data leakage. – smcireformulate
function. – SavedByJESUSlm(m1[,'dep'] ~ m1[,2:5])
– smci