I have a dataset for U.S. manufacturing workers in the past 30 decades, and I am particularly interested in the following variables:
- Month and year of 1st manufacturing job, recorded separately and named "start_month_job_1" & "start_yr_job_1."
- Month and year of leaving the 1st manufacturing job, recorded separately and named "end_month_job_1" & "end_yr_job_1."
- The reason for leaving the job (e.g. retirement, firing, factory shutdown, etc.), named "leaving_reason"
- Month and year of 2nd manufacturing job, recorded separately and named "start_month_job_2" & "start_yr_job_2."
- Month and year of leaving the 2nd manufacturing job, recorded separately and named "end_month_job_2" & "end_yr_job_2."
I am trying to create a variable that measures the duration of economic inactivity/idleness. I am defining "duration of economic inactivity" this as the time difference between leaving a 1st job and starting another job. I have created a variable that accomplishes that with years as in below:
gen econ_inactivity_duration_1 = start_yr_job_2 - end_yr_job_1
replace econ_inactivity_1 = 2018 - end_yr_job_1 if missing(start_yr_job_2 ) /// In cases where a worker never starts a second job until 2018, which is the latest year measured in the survey.
However, I want to actually create an economic_inactivity_duration variable that takes into account the difference in month and year, for both starting and leaving a job, respectively. For instance, the duration for the worker in row 1 would be 2 months, between May, 1993 and July, 1993, as opposed to zero, which is what my code above computes.
dataex start_month_job_1 byte start_yr_job_1 byte end_month_job_1 byte end_yr_job_1 byte start_month_job_2 byte start_yr_job_2 byte end_month_job_2 byte end_yr_job_2 byte leaving_reason
3 1990 5 1993 7 1993 4 1994 "Firm shutdown"
1 2003 7 2015 . . . . "job automation"
98 1979 98 2004 . . . . "Firm shutdown"
98 1975 98 2010 98 2010 98 2015 "job automation"
1 1983 12 1985 1 1986 . . "Firm shutdown"
98 1996 98 1998 . . . . "Firm shutdown"