0
votes

I have a problem in conditioning the dataset I have on Stata. Basically I want to condition the presence in the dataset -within a certain group- of an observation for which a certain action is performed (as indicated by a variable) on the past values of another variable. So let's suppose I have the following

obs | id | action1 | action2 | year 1 | 1 | 1 | 0 | 2000 2 | 1 | 0 | 1 | 2001 3 | 1 | 0 | 1 | 2002 4 | 1 | 0 | 1 | 2002 5 | 1 | 0 | 1 | 2003 6 | 2 | 1 | 0 | 2000 7 | 2 | 1 | 0 | 2001 8 | 2 | 0 | 1 | 2002 9 | 2 | 0 | 1 | 2002 10 | 2 | 0 | 1 | 2003

And for each group identified by 'id' I want to keep the observation only if action 1 is performed or if action1 has been performed no earlier than 2 years before action2 has been performed. In this simplified example only observation 4 should be deleted. Please note that the 2 actions are not mutually exclusive and they can be performed more than once within the same year therefore looking at 2 observations in the past does not necessarily means to look at 2 years in the past.

A solution which I am not able to implement by code would be: gen act1year= action1 * year then by(id) store the value of act1year when they're different from 0 somewhere (I am not able to implement this) and then by(id) keep if action1=1 or if action2[_n]=1 and the range year[_n] to year[_n]-2 contains at least one of the values in the previously stored variable.

I know probably my suggestion is not the easiest way to go and still I am not able to implement it, unfortunately I cannot manage to find a code that help me doing this. Hope you can help me. Thanks

Francesco

1
For future questions, please post code with your failed attemps. It can only benefit you to show you've done your part to solve the problem. Getting feedback on what went wrong is an important part of the process.Roberto Ferrer
Please note I added a paragraph stating what I would do in order to solve the question if able to code ituser3193779

1 Answers

0
votes

The following assumes certain things.

clear
set more off

input ///
obs  id  action1  action2  year 
1  1  1  0  2000 
2  1  0  1  2001 
3  1  0  1  2002 
4  1  0  1  2003 
5  2  1  0  2000 
6  2  0  1  2001 
7  2  1  0  2002 
8  2  0  1  2003
end

list, sepby(id)

*-----

bysort id (year) : keep if action1 | (action1[_n-1] + action1[_n-2] > 0)

list, sepby(id)

What is between parenthesis evaluates to one or zero depending on whether the inequality is true or false, respectively. This fragment indicates if action 1 was taken in either of the previous two observations.

You need to decide what to do with the first two observations, as they can't be compared with exactly two previous observations (they don't exist). In the following example they are always kept, because comparing with a non-existant observation in this case implies adding missing values, which results in missing. A missing is considered a very large number in Stata.

You can also work with time-series operators (help tsvarlist, help xtset) and really respect the time variable. Here, I work with the previous two observations. That may or may not coincide with the previous two time points.

I think your two actions are mutually exclusive, but you are not explicit about it.