0
votes

I am trying to write a loop to generate and fill in a dummy variable for whether an individual was a member of a particular party in the year in question. My data is long with each observation being a person, year. It looks like the following.

X1                  X2                  X3                 
AR, 1972-1981       PDC, 1982-1986      PFL, 1986-.  
MD, 1966-1980       PMDB, 1980-1988     PSB, 1988-.  
MD, 1966-1968       AR, 1968-1980       PDS, 1980-1985

Before the comma is the party and after are the years in which the person was a member of the party. Any help would be greatly appreciated!

So far the code I have is:

rename X1 XA  
rename X2 XB  
rename X3 XC

foreach var of varlist XA XB XC{  
  split `var', parse (,)  
}
tabulate XA1, gen(p)
2
Plz share the code you've already tried but doesn't workAndrew_Lvov
foreach var of varlist X1 X2 X3 { split `var', parse (,) } tabulate X1, gen()user4438802
Oh, sorry, I thought this is a python question ) However, I suggest updating question with your code, that usually helps to get answers )Andrew_Lvov
For me at least, the desired data structure is not clear.Roberto Ferrer
My goal is to end up with a variable for party that reflects the year. So for the first line if the obs year was 1980 the party variable would be AR. If for the first line the year of the obs was 1985 the party would be PDC.user4438802

2 Answers

2
votes

Here's one way to do it. I had to make an assumption about what the missing year corresponds to in X3, so you will need to alter that.

/* Enter Data */
clear

input str20 X1 str20 X2 str20 X3                 
"AR, 1972-1981"       "PDC, 1982-1986"      "PFL, 1986-."  
"MD, 1966-1980"       "PMDB, 1980-1988"     "PSB, 1988-."  
"MD, 1966-1968"       "AR, 1968-1980"       "PDS, 1980-1985"
end

compress

/* Split X1,X2,X3 into party, start year and end year and create 3 ID variables that we need later */
forvalues v=1/3 {
    split X`v', parse(", " "-")
    gen id`v'=_n
}

/* Makes years numeric, and get rid of messy original data */
destring X12 X13 X22 X23 X32 X33, replace
replace X33 = 1990 if missing(X33) // enter your survey year here 
drop X1 X2 X3

/* stack the spells on top of each other */
stack (id1 X11 X12 X13) (id2 X21 X22 X23) (id3 X31 X32 X33), into(id party year1 year2) clear
drop _stack

/* Put the data into long format and fill in the gaps */
reshape long year, i(id party) j(p)
drop p
/* need this b/c people can be in more than one party in a given year */
egen idparty = group(id party), label
xtset idparty year
tsfill
carryforward id party, replace
drop idparty

/* create party dummies */
tab party, gen(DD_)

/* rename the dummies to have party affiliation at the end instead of numbers */
foreach var of varlist DD_* {
    levelsof party if `var'==1, local(party) clean
    rename `var' ind_`party'
}

drop party

/* get back down to one person-year observation */
collapse (max) ind_*, by(id year)

list id year ind_*, sepby(id) noobs
1
votes

Following Dimitriy's lead (and interpretation), here is a slightly different way of doing it. I make a different assumption about the missing endpoints, i.e., I truncate the series to the last known years.

clear
set more off

input ///
str15 (XA                  XB                  XC)                 
"AR, 1972-1981"       "PDC, 1982-1986"     "PFL, 1986-."
"MD, 1966-1980"       "PMDB, 1980-1988"    "PSB, 1988-."
"MD, 1966-1968"       "AR, 1968-1980"    "PDS, 1980-1985"
end

list

*----- what you want? -----

// main
stack X*, into(X) clear
bysort _stack: gen id = _n
order id, first

split X, parse (, -)
rename (X1 X2 X3) (party sdate edate)

destring ?date, replace
gen diff = edate - sdate + 1
expand diff

bysort id party: replace sdate = sdate[1] + _n - 1

drop _stack X edate diff

// create indicator variables
tabulate party, gen(y)

// fix years with two or more parties
levelsof party, local(lp) clean
collapse (sum) y*, by(id sdate)

// rename
unab ly: y*
rename (`ly') (`lp')

list, sepby(id)