How to compare dates in Stata?

Question

I have the following dataset in Stata

Id | date1    | date2
------------------------
1  | 28mar2005| 30jun2005
2  | 28mar2005| 31dec2005
3  | 28mar2005| 28febr2005
3  | 28mar2005| 30apr2005
3  | 28mar2005| 31dec2005

The values of the second column are the same for all observations. The year considered is the same in date1 and date2.

The day in date2 is always the last day of the relative month.

The dates in date2 are not repeated and are sorted within Id.

Within Id, I want to keep the observations of the dataset such that the "time distance" between "mm" of date1 and "mm" of date2 is the smallest. In case of indifference between n observations I want to keep the last observations displayed among the n observations. Hence, I want to get

Id | date1    | date2
------------------------
1  | 28mar2005| 30jun2005
2  | 28mar2005| 31dec2005
3  | 28mar2005| 30apr2005

My idea would be

(a) generate an "absolute value distance" between date1 and date2

(b) find the minimum absolute value distance by Id

(c) If a certain Id is listed more than once, keep the observation with the minimum absolute value distance

Any suggestion?

Dimitriy V. Masterov Dimitriy V. Masterov · Accepted Answer · 2015-04-17T18:10:41

%td dates are integers, where zero is 01/01/1960. This means that subtraction will give you the days between them:

clear 
input Id  str9 date1 str9 date2
1   28mar2005 30jun2005
2   28mar2005 31dec2005
3   28mar2005 28feb2005
3   28mar2005 30apr2005
3   28mar2005 31dec2005
3   27mar2005 27feb2005
end

gen DATE1 =date(date1,"DMY")
gen DATE2 =date(date2,"DMY")
format DATE? %td
drop date1 date2
rename DATE?, lower

gen diff = abs(date1-date2)
gsort Id diff -date2
bys Id: keep if _n==1

list, clean noobs

Note that I added the last row to create an instance of the "indifference" scenario (if there are ties in terms of absolute distance, keep the later date2).

How to compare dates in Stata?

2 Answers