I have a dataset containing various drugs and the dates they were supplied. I would like to create an indicator variable DIBP
that takes a value of 1 if the same drug was supplied during both period 1 and period 2 of a given year, and zero otherwise. Period 1 is 1 April to 30 June, and period 2 is 1 October to 31 December.
I have written the following code:
. input id month day year str10 drug
id month day year drug
1. 1 5 1 2003 aspirin
2. 1 11 1 2003 aspirin
3. 1 6 1 2004 aspirin
4. 1 5 1 2005 aspirin
5. 1 11 1 2005 aspirin
6. end
.
. gen date = mdy(month,day,year)
. format date %d
.
. gen period = 1 if inlist(month,4,5,6)
(2 missing values generated)
. replace period = 2 if inlist(month,10,11,12)
(2 real changes made)
.
. label define plab 1"1 April to 30 June" 2"1 October to 31 December"
. label value period plab
.
. * Generate indicator
. gen DIBP = 0
. label var DIBP "Drug In Both Periods"
.
. bysort id year: replace DIBP = 1 if drug[period==1] == "aspirin" & drug[period==2] == "aspirin"
(0 real changes made)
.
. list
+---------------------------------------------------------------------------------+
| id month day year drug date period DIBP |
|---------------------------------------------------------------------------------|
1. | 1 5 1 2003 aspirin 01may2003 1 April to 30 June 0 |
2. | 1 11 1 2003 aspirin 01nov2003 1 October to 31 December 0 |
3. | 1 6 1 2004 aspirin 01jun2004 1 April to 30 June 0 |
4. | 1 5 1 2005 aspirin 01may2005 1 April to 30 June 0 |
5. | 1 11 1 2005 aspirin 01nov2005 1 October to 31 December 0 |
+---------------------------------------------------------------------------------+
I would expect DIBP to take a value of 1 for observations 1,2,3 and 4 (because they took aspirin during both periods for years 2003 and 2005) and a value of zero for observation 3 (because aspirin was only taken during one period in 2004), but this isn't the case. Where am I going wrong? Thank you.