0
votes

I have a large text dataset in Stata that lists information about different research studies (data are in broad form, one row per study). Originally, I had a dataset with one variable per study (conditions_l) that listed the disease condition studied (could be cardiovascular disease, lymphoma, etc). I went through and coded each into different categories, for example, variable code_c represented cancer.
Example code:

gen code_c=0

replace code_c=code_c+1 if regexm(conditions_l, "leukemia")

Now, I have a dataset that has multiple condition variables per research study (instead of just conditions_l, I have conditions_l1, conditions_l2, etc. through conditions_l212). I want to use a loop to execute this on all of the condition variables but haven't been successful so far. So, if any of the condition variables for a given research study contain "lymphoma" I want to replace code_c with code_c+1. How might I change the code to run across multiple variables?

1
Strictly regexm() is a function, not a command.Nick Cox

1 Answers

1
votes

There is much that I don't understand about your sample code; for example, it seems to me you could have done replace code_c=1 if .... However, building on the code you have, the following should do what I understand you to want.

gen code_c=0
generate t=0
forvalues i = 1/212 {
    /* only add 1 even if it appears in more than one condition */
    replace t=1 if regexm(conditions_l`i', "leukemia")
    }
replace code_c=code_c+t

With my initial comment in mind, I think this could have been written

gen code_c=0
forvalues i = 1/212 {
    replace code_c=1 if regexm(conditions_l`i', "leukemia")
    }