1
votes

I have already asked a question about storing coefficients and standard errors of several regressions in a single dataset.

Let me just reiterate the objective of my initial question:

I would like to run several regressions and store their results in a DTA file that I could later use for analysis. My constraints are:

  1. I cannot install modules (I am writing code for other people and not sure what modules they have installed)
  2. Some of the regressors are factor variables.
  3. Each regression differ only by the dependent variable, so I would like to store that in the final dataset to keep track of what regression the coefficients/variances correspond to.

The solution suggest by Roberto Ferrer was working well on my test data, but turns out not to work so well on some other type of data. The reason is that my sample changes slightly from one regression to the next, and some factor variable does not take the same number of values in each regressions. This results in the fixed effects (created on the fly using i.myvar as a regressor) not having the same cardinality.

Let's say that I decide to put year fixed effects (as in: year-specific intercepts) using i.year but in one regression there is no observation for the year 2006. That means that this particular regression will have one fewer regressor (the dummy corresponding to year==2006 does not get created), and as a result a smaller matrix that stores the coeffs.

This results in a conformability error when trying to stack the matrices together.

I was wondering if there was a way to make the initial solution robust to varying number of regressors. (Perhaps saving each regressions as dta, then merging?)

I am still subject to the constraint that I cannot rely on external packages.

1
Frankly, too many words! Please give specific code and reproducible example(s). Otherwise this is likely to be judged off-topic. See stackoverflow.com/help/mcveNick Cox

1 Answers

2
votes

You can follow the strategy of appending datasets, making small changes to the code in the question you reference:

clear
set more off

save test.dta, emptyok replace

foreach depvar in marriage divorce {

    // test data
    sysuse census, clear 
    generate constant = 1
    replace marriage = . if region == 4 

    // regression
    reg `depvar' popurban i.region constant, robust noconstant  // regressions
    matrix result_matrix = e(b)\vecdiag(e(V))                   // grab coeffs and their variances in a 2xK matrix
    matrix rownames result_matrix = `depvar'_b `depvar'_v       // add rownames to the two extra rows

    // get original column names of matrix
    local names : colfullnames result_matrix

    // get original row names of matrix (and row count)
    local rownames : rowfullnames result_matrix
    local c : word count `rownames'

    // make original names legal variable names
    local newnames
    foreach name of local names {
        local newnames `newnames' `=strtoname("`name'")'
    }

    // rename columns of matrix
    matrix colnames result_matrix = `newnames'

    // from matrix to dataset
    clear
    svmat result_matrix, names(col)

    // add matrix row names to dataset
    gen rownames = ""
    forvalues i = 1/`c' {
        replace rownames = "`:word `i' of `rownames''" in `i'
    }

    // append
    append using "test.dta"
    save "test.dta", replace

}

// list
order rownames
list, noobs

The result is what you want. However, the problem is that the dataset is re-loaded every time around the loop; it loads data as many times as regressions you estimate.

You may want to take a look at post and check if you can manage a more efficient solution. statsby could also work, but you need to find a smart way of renaming the stored variables.