Linear regression not returning all coefficients

Question

I'm running linear regression with all predictors (I have 384 predictors), but only get 373 coefficients from summary. I'm wondering why does R not return all coefficients and how can I get all 384 coefficients?

full_lm <- lm(Y ~ ., data=dat[,2:385]) #384 predictors
coef_lm <- as.matrix(summary(full_lm)$coefficients[,4]) #only gives me 373

You will likely find the answer by examining the output of summary(full_lm). — lmo

Yannis Vassiliadis Yannis Vassiliadis · Accepted Answer · 2018-04-22T19:10:15

First, summary(full_lm)$coefficients[,4] returns the p-values not the coefficients. Now, to actually answer your question, I believe that some of your variables drop out of the estimation because they are perfectly collinear with some others. If you run summary(full_lm), you will see that the estimation for these variables returns NA in all fields. So, they are not included in summary(full_lm)$coefficients. As an example:

x<- rnorm(1000)
x1<- 2*x
x2<- runif(1000)
eps<- rnorm(1000)
y<- 5+3*x + x1 + x2 + eps
full_lm <- lm(y ~ x + x1 + x2) 
summary(full_lm)
#Call:
#lm(formula = y ~ x + x1 + x2)
#
#Residuals:
#     Min       1Q   Median       3Q      Max 
#-2.90396 -0.67761 -0.02374  0.71906  2.88259 
#
#Coefficients: (1 not defined because of singularities)
#            Estimate Std. Error t value Pr(>|t|)    
#(Intercept)  4.96254    0.06379   77.79   <2e-16 ***
#x            5.04771    0.03497  144.33   <2e-16 ***
#x1                NA         NA      NA       NA    
#x2           1.05833    0.11259    9.40   <2e-16 ***
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
#Residual standard error: 1.024 on 997 degrees of freedom
#Multiple R-squared:  0.9546,   Adjusted R-squared:  0.9545 
#F-statistic: 1.048e+04 on 2 and 997 DF,  p-value: < 2.2e-16

coef_lm <- as.matrix(summary(full_lm)$coefficients[,1])
coef_lm
#(Intercept)    4.962538
#x  5.047709
#x2 1.058327

Linear regression not returning all coefficients

2 Answers