0
votes

I'm looking at the Current Population Survey in Stata, although this question could apply to any survey with individual weights.

It's straightforward to generate a table showing the mean of a variable -- say wages -- over time given individual weights:

table qtr [aw=pworwgt], contents(mean wage)

What I'd like to do automatically is show the average level of, in this example, wages, but with the proportions of certain categories fixed to a date.

So for example, let's say I have 6 educational categories (Less than HS, HS, Some College, AA, BA/BS, Grad School)... I'd want to see how wages would be different if I fixed the educational proportions of the workforce to their, say, 2005 levels.

Ideally, the solution would not be resource intensive for large-numbered categories. For example, I might want to do something similar with the CPS's detail occupational metric, which has hundreds of levels.

My gut tells me "margins" may be part of the solution but I'm not familiar enough with that command... also, I'd like to be able to generate table output so I can graph in other software.

ETA: Here's the way I tried to do this for fixing weights by age and sex: by cycling through all the data, comparing the contemporaneous proportions to the base quarter proportions, and then adjusting the individual weights accordingly. This takes a really long time to cycle through however.

local start = tq(1994q1)
local end = tq(2014q4)

local base = tq(2006q1)
tempvar pop2006
tempvar cohort2006
tempvar poptemp
gen pworwgt_a = pworwgt

levelsof pesex, local(sex)

sum pworwgt if qtr == `base'
gen `pop2006' = r(N)*r(mean)
gen `cohort2006' = .
gen `poptemp' = .

forvalues age = 16/85 {
foreach s in `sex' {
    sum pworwgt if age == `age' & pesex == `s' & qtr == `base'
    replace `cohort2006' = r(N)*r(mean)/`pop2006'
    forvalues q = `start'/`end' {
        sum pworwgt if qtr == `q'
        replace `poptemp' = r(N)*r(mean)
        sum pworwgt if age == `age' & pesex == `s' & qtr == `q'
        replace pworwgt_a = pworwgt_a*`cohort2006'/((r(N)*r(mean))/`poptemp') if age == `age' & pesex == `s' & qtr == `q'
    }
}
}
1
Statalist is a good suggestion, however I disagree in that I think this is very much a programming question. I'm not asking for conceptual or methodological advice. I have (hopefully!) stated clearly what I intend to do, and that's really not up for discussion. It's just a matter of finding the right Stata command for doing so. - E. Vincenti
But if it's helpful, I've added the code I've already tried. - E. Vincenti
I will assume then that my addendum has addressed your concerns. Thanks. - E. Vincenti
Helpful suggestions. However, the answer does in fact depend on sex, as I'm holding both age and sex cohort proportions constant at 2006 Q1 levels. It's unfortunate, at the sex distinction doubles the number of calculations, but necessary. - E. Vincenti
The line sum pworwgt if qtr == q is for the purposes of getting the total population in each quarter. I'm adjusting the proportions of each sex/age combination (so say 27 year old women) as a percent of the total population. You'll see below that I assign the total (which you've helpfully recommend I use r(sum) for) to the poptemp variable. - E. Vincenti

1 Answers

1
votes

I don't have scope to test this, but here are suggested simplifications to the code segment. I don't address the main question, which I don't understand, partly because there is no precise description of data structure in the question.

To summarize suggestions:

Use summarize, meanonly when that is all you need and use r(sum) ditto.

Use scalars not variables for constants.

Shift repeated calculations to once-and-for-all calculations of variables. I think you can do even more of this, but I will stop here.

   local start = tq(1994q1)
   local end = tq(2014q4)
   local base = tq(2006q1)
   tempname pop2006 cohort2006 
   tempvar qassum qsum 

   // quarter-age-sex sums in a single variable 
   bysort qtr age pesex : gen double `qassum` = sum(pworwgt) 
   by qtr age pesex : replace `qassum` = `qassum`[_N] 

   // quarterly sums in a single variable 
   by qtr: gen double `qsum' = sum(pworwgt) 
   by qtr: replace `qsum` = `qsum'[_N] 

   gen pworwgt_a = pworwgt

   levelsof pesex, local(sex)

   sum pworwgt if qtr == `base', meanonly 
   scalar `pop2006' = r(sum)

   forvalues age = 16/85 {
       foreach s in `sex' {
           sum pworwgt if age == `age' & pesex == `s' & qtr == `base', meanonly 
           scalar `cohort2006' = r(sum)/`pop2006'
           replace pworwgt_a = pworwgt_a*`cohort2006'/`qassum'/`qsum' if age == `age' & pesex == `s'                
       }
   }