Stata Duplicate/Split Observations Each Time Variables not Equal

Question

I am working out of Stata SE 13 and have a dataset that looks something like this (I know this is a code block, but I used it for formatting purposes):

Obs var1 var2 masterID id1 id2 id3 id4 . . . idN
1    a     b     123   123 123 123 123 . . . 123
2    c     d     456   456 456 789 897 . . . 897

Sometimes the id variables change in an observation, and each time this happens I would like to create another observation including all other variables and make all other id's blank. Observation 1 is perfect, but observations like 2 are the issue. From the table above, the desired output is:

Obs var1 var2 masterId id1 id2 id3 id4 . . . idN
1    a     b     123   123 123 123 123 . . . 123   
2    c     d     456   456 456  .   .  . . .  .
3    c     d     789    .   .  789  .  . . .  .
4    c     d     897    .   .   .  897 . . . 897

I've attempted to use something like:

forvalues i = 1(1)N {
    local k = `i'+1
    expand 2 if id`i' != id`k', generate(mark`i')
    forvalues w= `k'(1)2014 {
         replace id`w' = "" if mark`i' == 1
         replace masterId = id`i' if mark`i' == 1
    }
}
* Repeat for forward values
* Some deleting of blank observations and original observations also

This works beautifully if the id's change once, but for anything that changes multiples times there is huge issues (makes a jumbled mess with data).

Any advice would be greatly appreciated! Thank you.

Dimitriy V. Masterov Dimitriy V. Masterov · Accepted Answer · 2016-03-16T17:31:11

Problems like this are usually easier when data is in long format. That is the approach I take here:

clear
input str1(var1 var2) masterID id1 id2 id3 id4 id5
"a" "b" 123   123 123 123 123 123
"c" "d" 456   456 456 789 897 897
end

reshape long id, i(masterID) j(i)
replace masterID = id if masterID!=id
reshape wide id, i(masterID) j(i)
order var1 var2 masterID

Stata Duplicate/Split Observations Each Time Variables not Equal

1 Answers