0
votes

I'm running linear regression with all predictors (I have 384 predictors), but only get 373 coefficients from summary. I'm wondering why does R not return all coefficients and how can I get all 384 coefficients?

full_lm <- lm(Y ~ ., data=dat[,2:385]) #384 predictors
coef_lm <- as.matrix(summary(full_lm)$coefficients[,4]) #only gives me 373
2
You will likely find the answer by examining the output of summary(full_lm).lmo

2 Answers

1
votes

First, summary(full_lm)$coefficients[,4] returns the p-values not the coefficients. Now, to actually answer your question, I believe that some of your variables drop out of the estimation because they are perfectly collinear with some others. If you run summary(full_lm), you will see that the estimation for these variables returns NA in all fields. So, they are not included in summary(full_lm)$coefficients. As an example:

x<- rnorm(1000)
x1<- 2*x
x2<- runif(1000)
eps<- rnorm(1000)
y<- 5+3*x + x1 + x2 + eps
full_lm <- lm(y ~ x + x1 + x2) 
summary(full_lm)
#Call:
#lm(formula = y ~ x + x1 + x2)
#
#Residuals:
#     Min       1Q   Median       3Q      Max 
#-2.90396 -0.67761 -0.02374  0.71906  2.88259 
#
#Coefficients: (1 not defined because of singularities)
#            Estimate Std. Error t value Pr(>|t|)    
#(Intercept)  4.96254    0.06379   77.79   <2e-16 ***
#x            5.04771    0.03497  144.33   <2e-16 ***
#x1                NA         NA      NA       NA    
#x2           1.05833    0.11259    9.40   <2e-16 ***
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
#Residual standard error: 1.024 on 997 degrees of freedom
#Multiple R-squared:  0.9546,   Adjusted R-squared:  0.9545 
#F-statistic: 1.048e+04 on 2 and 997 DF,  p-value: < 2.2e-16

coef_lm <- as.matrix(summary(full_lm)$coefficients[,1])
coef_lm
#(Intercept)    4.962538
#x  5.047709
#x2 1.058327
0
votes

E.g., if some columns in your data are linear combinations of others, then the coefficient will be NA and if you index the way you do, it'll be omitted automatically.

a <- rnorm(100)
b <- rnorm(100)
c <- rnorm(100)
d <- b + 2*c

e <- lm(a ~ b + c + d)

gives

Call:
lm(formula = a ~ b + c + d)

Coefficients:
(Intercept)            b            c            d  
   0.088463    -0.008097    -0.077994           NA  

But indexing...

> as.matrix(summary(e)$coefficients)[, 4]
(Intercept)           b           c 
  0.3651726   0.9435427   0.3562072