1
votes

Suppose that I have a long dataset in Stata, categorized by vartype; where vartype is in the range of A to D.

list var1 var2 var3 vartype in 1/10

    +--------------------------------------+
     |  var1   var2          var3   vartype |
     |--------------------------------------|
  1. | 1:Yes      1        900000         A |
  2. | 1:Yes      1             0         C |
  3. | 1:Yes      1             0         A |
  4. | 1:Yes      1       1000000         D |
  5. | 1:Yes      1       8000000         C |
     |--------------------------------------|
  6. | 1:Yes      1       3100000         C |
  7. | 1:Yes      1             0         B |
  8. | 1:Yes      1       4000000         A |
  9. | 1:Yes      1             .         A |
 10. | 1:Yes      1   1.00000e+12         B |
     +--------------------------------------+   

I want to reshape it into wide and rename each original variable (var1 var2 var3) into different names (say inpatient outpatient cost). I also want for each code of vartype (A to D) into a different category (chol diab hyper cancer) after doing reshape.

For example, after reshape wide, I will get var01A, var01B, var01C, etc. and want to rename them as inpatient_chol, inpatient_diab, inpatient_hyper, etc. This should also applied for other variables; var2 = outpatient and var3 = cost.

For now, all I know is to do these lines below while I am looking for another way(s) such as nested loop or maybe even simpler codes.

reshape wide var1 var2 var3, i(hhid pid) j(vartype) string

foreach y in var1 var2 var3 {
rename `y'A `y'cholesterol
rename `y'B `y'diabetes
rename `y'C `y'hypertension
rename `y'D `y'cancer
}

}
foreach x in cholesterol diabetes hypertension cancer {
rename var1`x' has_`x'
rename var2`x' inpatient_`x'
rename var3`x' cost_`x'
}

I know I can rename and recode each variable and each vartype before reshaping it into wide. I just want to know if there's another way for a wide dataset.

1
rename allows many different syntaxes. Off-hand I don't see that any offers much improvement on what you already have.Nick Cox
Your code is clean and clear. Straining to write shorter code could be fun, but it could just seem obfuscated.. Detailed answers proving me wrong would be fine.Nick Cox
Thank you for your reply, sir. It appears that I won't be able to prove you wrong as I was just looking for how others might write the codes above.Insan Alfarizy
:@JR96 has nicely filled in the gap.. Their warnings are apposite too.Nick Cox

1 Answers

2
votes

Your code looks great as is, and would be how I would be inclined to rename the variables in my own code. If you are interested in something shorter (albeit less explicit) you could take advantage of the * syntax in the rename command. In two lines of code:

rename (*A *B *C *D) (*cholesterol *diabetes *hypertension *cancer)
rename (var1* var2* var3*) (has_* inpatient_* cost_*)

Just be careful that, for example, you don't have other variables ending with A or beginning with var1 that you do not wish to rename. The complete PDF manual for rename has some other handy renaming tricks.