omitting summary statistics from -tabstat- by subgroup if they are missing values

Question

I am writing a code to export a summary statistics table in Latex by using tabstat, by() and esttab.

Here you have a toy example that replicates the structure of my data:

cls
clear all
set more off

use auto, clear

// Create two groups to be used in the -by()- option
gen rep2="First dataset" if rep78>=3
replace rep2="Second dataset" if rep78<3 

// Recode "price" as completely missing in the first group
replace price=. if rep2=="First dataset"  

// Table
eststo: estpost tabstat weight price mpg trunk, ///
    column(statistics) statistics(count mean median sd) by(rep2) nototal

local sum_statistics "count(label(Observations)) mean(label(Mean) fmt(2)) p50(label(Median)) sd(label(Standard deviation) fmt(2))"

esttab using "table1.tex", replace type ///
    title("Summary Statistics")  ///
    cells("`sum_statistics'") ///   
    noobs nonum booktabs

The output displays summary statistics into two subtables, one for each dataset (as defined by rep2). These two datasets don't necessarily have the same variables: price is completely missing from the first dataset.

I would like to entirely omit the row of summary statistics of price for "First dataset" only (leaving it for "Second dataset"). This because since the variable price is missing for "First dataset", all its summary statistics are missing values. This is equivalent to omit the entire row of summary statistics in case "Observations" is equal to 0 in the specific by-group.

I looked into the documentation of tabstat but I am not quite sure about how to proceed. Do I have to use the drop() option of estout?

Many thanks, S

Roberto Ferrer Roberto Ferrer · Accepted Answer · 2015-09-21T12:53:40

As you mention, you can use the drop() option:

clear all
set more off

sysuse auto, clear

// Create two groups to be used in the -by()- option
gen rep2="First" if rep78>=3
replace rep2="Second" if rep78<3 

// Recode "price" as completely missing in the first group
replace price=. if rep2=="First dataset"  

// Table
eststo: estpost tabstat weight price mpg trunk, ///
    column(statistics) statistics(count mean median sd) by(rep2) nototal

local sum_statistics "count(label(Observations)) mean(label(Mean) fmt(2)) p50(label(Median)) sd(label(Standard deviation) fmt(2))"

esttab, replace type ///
    title("Summary Statistics")  ///
    cells("`sum_statistics'") ///   
    noobs nonum booktabs drop(First:price)

This involves using the full name and not just the variable name.

Notice I took out the blank space in the values of the grouping variable. That seems to be troublesome when calling esttab, but I leave that to you to explore.

omitting summary statistics from -tabstat- by subgroup if they are missing values

1 Answers