4
votes

Some observations in my data set need to be split into two or three differenet observations. For example the following observation:

region  income   gdp   other
North   120      450   50

I need to split it into three observation with the same values for all variables except for the region like this:

region  income   gdp   other
IL      120      450   50
MI      120      450   50
IN      120      450   50

I need something like:

if (region == "North") {
//create three new observations and delete the old one
}

Is it possible with Stata?

1
Where is your full region variable stored before you apply it? If it's in a dataset, just expand all variables and merge the two datasets on row numbers.Fr.
@Fr. thanks for reply. It is in a dataset. I am not sure what you mean by "merging the two datasets". Not all observations need to be duplicated. Only those who have the value "North" in region variable should be duplicated, and the value "North" must be changed for IL, MI, IN...I am not sure how I could merge...could you elaborate on it?CHEBURASHKA
Do you have a dataset with a variable holding regions and another one holding states? If so, expand the master data as needed, sort them identically and merge using the region/state dataset on _n. If you provide a data extract somewhere, perhaps it would be easier to show you how this might be able to work.Fr.
Your dataset does not specify which region should be allocated to which state. You need to provide a list of the correspondences that you want to get. What you really want to get remains a mystery to me, sorry.Fr.

1 Answers

4
votes

It is difficult to work out the general problem here from your example. Note that

if region == "North" { 
      <code>
} 

does not work as you seem to expect, as it is equivalent to

if region[1] == "North" { 
      <code>
} 

and is a once-only branch. This is documented at http://www.stata.com/support/faqs/programming/if-command-versus-if-qualifier/

This is legal:

expand 3 if region == "North" 

but you would need to follow with one-by-one replacements.

(LATER) A wild guess is that you are following on from Stata. How to match values in 1:m relationship? and trying to re-invent merge. All I can say is that would be a major project for an experienced Stata programmer.

(STILL LATER)

 gen long obsid = _n 
 gen state = "" 
 gen isnorth = region == "North" 
 expand 3 if isnorth 
 bysort obsid : replace state = "IL" if isnorth & _n == 1 
 by obsid : replace state = "MI" if isnorth & _n == 2
 by obsid : replace state = "IN" if isnorth & _n == 3