2
votes

My dataset includes TWO main variables X and Y.

  • Variable X represents distinct codes (e.g. 001X01, 001X02, etc) for multiple computer items with different brands.
  • Variable Y represents the tax charged for each code of variable X (e.g. 15 = 15% for 001X01) at a store.

I've created categories for these computer items using dummy variables (e.g. HD dummy variable for Hard-Drives, takes value of 1 when variable X represents a HD, etc). I have a list of over 40 variables (two of them representing X and Y, and the rest is a bunch of dummy variables for the different categories I've created for computer items).

I would like to display the averages of all these categories using a loop in Stata, but I'm not sure how to do this.

For example the code:

mean Y if HD == 1


Mean estimation                     Number of obs    =       5

--------------------------------------------------------------
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
   Tax       |        7.1   2.537716      1.154172    15.24583

gives me the mean Tax for the category representing Hard Drives. How can I use a loop in Stata to automatically display all the mean Taxes charged for each category? I would do it by hand without a problem, but I want to repeat this process for multiple years, so I would like to use a loop for each year in order to come up with this output.

My goal is to create a separate Excel file with each of the computer categories I've created (38 total) and the average tax for each category by year.

2

2 Answers

2
votes

Why bother with the loop and creating the indicator variables? If I understand correctly, your initial dataset allows the use of a simple collapse:

clear all
set more off

input ///
code tax str10 categ
1 0.15 "hd"
2 0.25 "pend"
3 0.23 "mouse"
4 0.29 "pend"
5 0.16 "pend"
6 0.50 "hd"
7 0.54 "monitor"
8 0.22 "monitor"
9 0.21 "mouse"
10 0.76 "mouse"
end

list

collapse (mean) tax, by(categ)

list

To take to Excel you can try export excel or put excel.

Run help collapse and help export for details.


Edit

Because you insist, below is an example that gives the same result using loops. I assume the same data input as before. Some testing using this example database with expand 1000000, shows that speed is virtually the same. But almost surely, you (including your future you) and your readers will prefer collapse. It is much clearer, cleaner and concise. It is even prettier.

levelsof categ, local(parts)
gen mtax = .

quietly {
    foreach part of local parts {
        summarize tax if categ == "`part'", meanonly
        replace mtax = r(mean) if categ == "`part'"
    }
}

bysort categ: keep if _n == 1
keep categ mtax

Stata has features that make it quite different from other languages. Once you start getting a hold of it, you will find that many things done with loops elsewhere, can be made loop-less in Stata. In many cases, the latter style will be preferred.

See corresponding help files using help <command> and if you are not familiarized with saved results (e.g. r(mean)), type help return.

1
votes

A supplement to Roberto's excellent answer: After collapse, you will need a loop to export the results to excel.

levelsof categ, local(levels)
foreach x of local levels {
export  excel  `x', replace
}

I prefer to use numerical codes for variables such as your category variable. I then assign them value labels. Here's a version of Roberto's code which does this and which, for closer correspondence to your problem, adds a "year" variable

input code tax  categ year
1 0.15  1 1999
2 0.25  2 2000
3 0.23  3 2013
4 0.29  1 2010
5 0.16  2 2000
6 0.50  1 2011
7 0.54  4 2000
8 0.22  4 2003
9 0.21  3 2004
10 0.76 3 2005
end

#delim ;
label define catl
1 hd
2 pend
3 mouse
4 monitor
;
#delim cr
label values categ catl
collapse (mean) tax, by(categ year)

levelsof categ, local(levels)
foreach x of local levels {
export  excel  `:label (categ) `x'', replace
}

The #delim ; command makes it possible to easily list each code on a separate line. The"label" function in the export statement is an extended macro function to insert a value label into the file name.