How to output multiple regression results in R when data is huge

Question

I am working on three linear regression models in R. For example:

reg1=lm(y~x1,data=dataset)
reg2=lm(y~x2,data=dataset)
reg3=lm(y~x3,data=dataset)

I want to output those results using 'stargazer' package. The code is like:

library(stargazer)
stargazer(reg1, reg2, reg3, title="Results", align=TRUE,type = "html",style = "qje", out="Table1.html")

But the thing is the dataset I am working with is huge. So the sizes of reg1, reg2 and reg3 are huge (each is of 3.8Gb approximately). My computer cannot save all the three regression models results at the same time. But intuitively I only need the coefficients, standard deviations and p-values, etc. Those items should not take too much space. How can I get around this problem?

check out the broom:: package -- designed for exactly this. Not sure how it handles complicated models, but if you just need to extract coefficients from linear models fit with lm(), that'll do the trick. — lefft
@benbolker, adding model=FALSE does not solve the problem. The object is still huge. — Allen
@left Thanks for the suggestion. But I know how to output the results into matrix one by one. I am just trying to output three set of results together into a nice format table. That is why I am using stargazer. — Allen

Ben Bolker Ben Bolker · Accepted Answer · 2017-11-22T14:25:31

It is possible to strip down an lm object considerably, but not all the way. (A more principled approach would be to rebuild stargazer along different lines, but that's a lot more work.) In particular, getting rid of the residuals and the row/column names associated with some of the components helps a lot (but it has the biggest impact with the example given above, where there is a single numeric predictor; it will help much less if there is a larger set of predictors). The largest remaining components that stargazer needs in order to work are the QR matrix and the fitted values.

Generate an example:

set.seed(101)
dd <- data.frame(y=rnorm(1e5),x1=rnorm(1e5),x2=rnorm(1e5),x3=rnorm(1e5))
reg1=lm(y~x1,data=dd)
reg2=lm(y~x2,data=dd)
reg3=lm(y~x3,data=dd)

Utility functions for exploring the sizes of components of the model:

inspect <- function(x=reg1) {
    sapply(x,function(z) round(as.numeric(object.size(z))/2^20,2))
}
s <- function(x) cat(format(object.size(x),units="Mb"),"\n")

Try them out:

s(reg1)  ## 24.4 Mb
inspect(reg1)
##  coefficients     residuals       effects          rank fitted.values 
##          0.00          6.87          1.53          0.00          6.87 
##        assign            qr   df.residual       xlevels          call 
##          0.00          7.63          0.00          0.00          0.00 
##         terms         model 
##          0.00          1.53

A function to remove components:

strip_lm <- function(x) {
    x$residuals <- NULL
    x$model <- NULL
    x$effects <- NULL
    names(x$fitted.values) <- NULL
    dimnames(x$qr$qr) <- NULL
    return(x)
}

Try it out:

reg1B <- strip_lm(reg1)
s(reg1B)
##2.3 Mb 
inspect(reg1B)
## coefficients          rank fitted.values        assign            qr 
##         0.00          0.00          0.76          0.00          1.53 
##  df.residual       xlevels          call         terms 
##        0.00          0.00          0.00          0.00

In this case, the only large components left are the fitted values and the QR decomposition, both of which stargazer needs, but deleting names has saved a lot of space. Deleting names won't help nearly as much if there are lots of predictors (i.e. X matrix is wide, not just long) ...

Try out stargazer to make sure we haven't deleted anything it needs:

library(stargazer)
res <- capture.output(stargazer(reg1B,reg2,reg3,
          title="Results", align=TRUE,type = "html",
          style = "qje", out="Table1.html"))

How to output multiple regression results in R when data is huge

1 Answers