3
votes

I'm trying to use the tabulate command in Stata to create a time series of frequencies. The problem arises when I try to combine the output of tabulate after running through each date. tabulate will not include 0 as an entry when no observation exists for a value of the variable in question. For instance, if I wanted to count the 10, 11 and 12 year olds in a class over a three-year period Stata might output (8) if only one of the groups were represented and thus we don't know which group the 8 students belonged to: it could be (0,8,0) or (0,0,8).

This is not a problem if the time series is short as the "Results" window shows which categories are or are not represented. I have a much longer time series to my data. Does anyone know of a solution/method that forces Stata to include zeroes in these tabulations? The relevant parts of my code follows:

# delimit;
set more off;
clear;
matrix drop _all;
set mem 1200m;
cd ;
global InputFile "/Users/.../1973-2010.dta";
global OutputFile "/Users/.../results.txt";

use $InputFile;
compress;

log using "/Users/.../log.txt", append;

gen yr_mn = ym(year(datadate), month(datadate));
la var yr_mn "Year-Month Date"

xtset, clear;
xtset id datadate, monthly;

/*Converting the Ratings Scale to Numeric*/;
gen LT_num = .;
replace LT_num = 1 if splticrm=="AAA";
replace LT_num = 2 if (splticrm=="AA"||splticrm=="AA+"||splticrm=="AA-");
replace LT_num = 3 if (splticrm=="A"||splticrm=="A+"||splticrm=="A-");
replace LT_num = 4 if (splticrm=="BBB"||splticrm=="BBB+"||splticrm=="BBB-");
replace LT_num = 5 if (splticrm=="BB"||splticrm=="BB+"||splticrm=="BB-");
replace LT_num = 6 if (splticrm=="B"||splticrm=="B+"||splticrm=="B-");
replace LT_num = 7 if (splticrm=="CCC"||splticrm=="CCC+"||splticrm=="CCC-");
replace LT_num = 8 if (splticrm=="CC");
replace LT_num = 9 if (splticrm=="SD");
replace LT_num = 10 if (splticrm=="D");

summarize(yr_mn);
local start = r(min);
local finish = r(max);

forv x = `start'/`finish' {;
    qui tab LT_num if yr_mn == `x', matcell(freq_`x');
};

log close;
3

3 Answers

2
votes

What you want is not an option with the tab command. If you want to display the results to the screen, you might be able to use table ..., missing successfully.

Instead of the loop, you could try the following, which I think will work for your purposes:

preserve
gen n = 1  // (n could be a variable that indicates if you want to include the row or not; or just something that never ==.)
collapse (count) n , by(LT_num yr_mn)
reshape wide n, i(yr_mn) j(LT_num)
mkmat _all , matrix(mymatname) 
restore
mat list mymatname

I think that is what you're going after (but can't tell how you use the matrices you are trying to generate).

P.S. I prefer to use the inlist function for things like:

replace LT_num = 2 if inlist(splticrm,"AA","AA+","AA-")
2
votes

This problem is addressed by tabcount. See the 2003 paper

http://www.stata-journal.com/article.html?article=pr0011

and download the program code and help files after getting a link by search tabcount.

0
votes

This is the solution that I used. Keith's is probably better, and I will explore his solution in the future.

I saved the row labels (using matrow) in a vector and used it as an index for a matrix of the correct dimensions initialized to zero. That way I could place each frequency into the matrix at the correct place, and keep all of the zeros. The solution follows the above code after "local finish=r(max)". [note that I include a counter to eliminate the first observations which are empty for this variable.]

local counter=0;
forv x = `first'/`last' {;
tab LT_num if yr_mn == `x', matrow(index_`x') matcell(freq_`x');
local rows = r(r); /*r(r) is number of rows for tabulate*/;

if `rows'!=0{;
    matrix define A_`x'=J(10,1,0);
    forv r=1/`rows'{;
        local a=index_`x'[`r',1];
        matrix define A_`x'[`a',1]=freq_`x'[`r',1];
    };
};
else {;
    local counter=`counter'+1;
};
};   


local start=`first'+`counter'+1;
matrix define FREQ = freq_`start';

forv i = `start'/`last' {;
    matrix FREQ = (FREQ,A_`i');
};