3
votes

I am pretty new to Stata programming.

My question: I need to reorder/reshape a dataset through (I guess) a macro.

I have a dataset of individuals, with a variable birthyear' (year of birth) and variables each containing weight at a given CALENDAR year: e.g.

BIRTHYEAR | W_1990 | W_1991 | W_1992 | ... | w_2000
1989 | 7.2 | 9.3 | 10.2 | ... | 35.2
1981 | 33.2 | 35.3 | ...

I would like to obtain new variables containing weight at different ages, e.g. Weight_age_1, Weight_age_2, etc.: this means take for instance first obs of example, leave Weight_age_1 blank, put 7.2 in Weight_age_2, and so on.

I have tried something like...

forvalues i = 1/10{
    capture drop weight_age_`i'
    capture drop birth`i

    gen birth_`i'=birthyear-1+`i'
    tostring birth_`i', replace

    gen weight_age_`i'= w_birth_`i'
}

.. but it doesn't work.

Can you please help me?

1

1 Answers

3
votes

Experienced Stata users wouldn't try to write a self-contained program here: they would see that the heart of the problem is a reshape.

clear 
input birthyear w_1990 w_1991  w_1992 
1989  7.2  9.3  10.2 
1981  33.2  35.3 37.6 
end 

gen id = _n 
reshape long w_, i(id)
rename _j year
gen age = year - birthyear
l, sepby(id)

         +-----------------------------------+
         | id   year   birthy~r     w_   age |
         |-----------------------------------|
      1. |  1   1990       1989    7.2     1 |
      2. |  1   1991       1989    9.3     2 |
      3. |  1   1992       1989   10.2     3 |
         |-----------------------------------|
      4. |  2   1990       1981   33.2     9 |
      5. |  2   1991       1981   35.3    10 |
      6. |  2   1992       1981   37.6    11 |
         +-----------------------------------+

To get the variables you say you want, you could reshape wide, but this long structure is by far the more convenient way to store these data for future Stata work.

P.S. The heart of your programming problem is that you are getting confused between the names of variables and their contents.

But this is a "look-up" approach made to work:

clear 
input birthyear w_1990 w_1991  w_1992 
1989  7.2  9.3  10.2 
1981  33.2  35.3 37.6 
end 

quietly forval j = 1/10 { 
    gen weight_`j' = . 

    forval k = 1990/1992 { 
        replace weight_`j' = w_`k' if (`k' - birthyear) == `j' 
    }
}    

The essential trick is to do name manipulation using local macros. In Stata, variables are mainly for holding data; single-valued constants are better held in local macros and scalars. (Your sense of the word "macro" as meaning script or program is not how the term is used in Stata.)

As above: this is the data structure you ask for, but it is likely to be more problematic than that produced by reshape long.