0
votes

I have a dataset for U.S. manufacturing workers in the past 30 decades, and I am particularly interested in the following variables:

  1. Month and year of 1st manufacturing job, recorded separately and named "start_month_job_1" & "start_yr_job_1."
  2. Month and year of leaving the 1st manufacturing job, recorded separately and named "end_month_job_1" & "end_yr_job_1."
  3. The reason for leaving the job (e.g. retirement, firing, factory shutdown, etc.), named "leaving_reason"
  4. Month and year of 2nd manufacturing job, recorded separately and named "start_month_job_2" & "start_yr_job_2."
  5. Month and year of leaving the 2nd manufacturing job, recorded separately and named "end_month_job_2" & "end_yr_job_2."

I am trying to create a variable that measures the duration of economic inactivity/idleness. I am defining "duration of economic inactivity" this as the time difference between leaving a 1st job and starting another job. I have created a variable that accomplishes that with years as in below:

gen econ_inactivity_duration_1 = start_yr_job_2 - end_yr_job_1 
replace econ_inactivity_1 = 2018 - end_yr_job_1 if missing(start_yr_job_2 ) /// In cases where a worker never starts a second job until 2018, which is the latest year measured in the survey.

However, I want to actually create an economic_inactivity_duration variable that takes into account the difference in month and year, for both starting and leaving a job, respectively. For instance, the duration for the worker in row 1 would be 2 months, between May, 1993 and July, 1993, as opposed to zero, which is what my code above computes.

dataex start_month_job_1 byte start_yr_job_1 byte end_month_job_1 byte end_yr_job_1 byte start_month_job_2 byte start_yr_job_2 byte end_month_job_2 byte end_yr_job_2 byte leaving_reason

 3 1990  5 1993  7 1993  4 1994 "Firm shutdown"
 1 2003  7 2015  .    .  .    . "job automation"
98 1979 98 2004  .    .  .    . "Firm shutdown"
98 1975 98 2010 98 2010 98 2015 "job automation"
 1 1983 12 1985  1 1986  .    . "Firm shutdown"
98 1996 98 1998  .    .  .    . "Firm shutdown"
1
98 looks like unknown, so you can ignore those observations, impute months randomly or guess at 7 as a crude near average. Watch out for imputing months in the same year.Nick Cox
Thanks, indeed, I am excluding unknown value "98" observations. However, I am curious, mind elaborating more on your warning about imputing months in the same year?nesta13
Suppose you don't know the months of start and end for a job that starts and ends in the same year. Then it's vital that any imputation (e.g. draws from 1 .... 12) respects the fact that the end follows the start. If you impute 7 for start and end, then all such jobs have implied duration 0. You could impute 1...6 for the start and 7...12 for the end. There is probably literature on this, but I have no idea what it says.Nick Cox
great explanation, thanks!nesta13
Or (easier) don't impute the start and end if the job starts and ends in the same year, but you don't know when: just impute the duration from 0 (start and end in the same month) to 11 (Jan to Dec).Nick Cox

1 Answers

1
votes

There is probably a better way, but here is a crude method.

* Data example
input end_month_job_1 end_yr_job_1 start_month_job_2 start_yr_job_2
5 1993 7 1993
end

* Calculate months since 1960
gen j1_end = (end_yr_job_1 - 1960) * 12 + end_month_job_1
gen j2_start = (start_yr_job_2 - 1960) * 12 + start_month_job_2

* Calculate difference
gen wanted = j2_start - j1_end

* Check difference is positive
assert wanted > 0

list

     +------------------------------------------------------------------------+
     | end_mo~1   end_yr~1   s~mont~2   s~yr_j~2   j1_end   j2_start   wanted |
     |------------------------------------------------------------------------|
  1. |        5       1993          7       1993      401        403        2 |
     +------------------------------------------------------------------------+