2
votes

I want generate the following endogenous lag (Y) variable

set Y=1 in the current routine year, if submission==1 and routineyear==1 in the previous routine year

set Y=2 in the current routine year, if sub==0 and routineyear==1 in the previous routine year

Otherwise=0 

Note though that "previous routine year" is not previous year, the intervals between routine years varies. This is actually what makes it hard for me to generate this variable.

Basically, I want to generate an endogenous variable that would capture state's behavior in their LAST routineyear.

To illustrate what I want to do:

Assume that country A had its routine year in 1990 - the same year the submission variable was also =1. This would generate Y=1.

Now, the next routineyear for country A is in 1992, where the submission=1 and routineyear=1 in that year. The endogenous lag in this should indicate A's previous behavior as in 1990 (Y=1).

Then, the next routineyear is in 1996 where submission=0 while routineyear=1. The endogenous lag in this case would be the value of A's previous behavior in 1992 (Y=1).

Then again, next routineyear is in 1998, where submission=1 and routineyear=1. The endogenous lag here should indicate A's previous behavior in the last routineyear, in 1996. that is: Y=2!.

This is how the endogenous lag should look like (based on the example above)

country year     submission routineyear  Y(endo lag)
A       1990          1            1     1  
A       1991          0            0     0
A       1992          1            1     1 
A       1993          1            0     0
A       1994          0            0     0
A       1995          0            0     0
A       1996          0            1     1
A       1997          0            0     0
A       1998          1            1     2
A       1999          0            0     0
A       2000          0            0     0
A       2001          0            1     1
A       2002          0            0     0
A       2003          1            1     2

I've been trying to do this using different logics but without success. One of the biggest problems is that routine year is different for each country, the intervals are not stable.

I believe that someone who can write proper codes/functions in R would be able to slove this puzzle. If not, I would appreciate all recommendations as how to proceed from here.

A sample from my real data:

structure(list(ccode = c(31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 52L, 52L, 52L, 52L, 52L, 52L, 52L, 52L, 52L, 52L, 52L, 52L, 52L, 52L, 52L, 52L, 52L, 52L, 52L, 52L, 52L, 52L, 53L, 53L, 53L, 53L, 53L, 53L, 53L, 53L, 53L, 53L, 53L, 53L, 53L, 53L, 53L, 53L, 53L, 53L, 53L, 53L, 53L, 53L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 70L, 70L, 70L, 70L, 70L, 70L, 70L, 70L, 70L, 70L, 70L, 70L, 70L, 70L, 70L, 70L, 70L, 70L, 70L, 70L, 70L, 70L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L), year = c(1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L), country = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), .Label = c("Bahamas", "Barbados", "Belize", "Cuba", "Dominica", "Dominican Republic", "Guatemala", "Haiti", "Jamaica", "Mexico", "Trinidad and Tobago"), class = "factor"), submission = c(1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L), routineyear = c(1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L )), .Names = c("ccode", "year", "country", "submission", "routineyear"), class = "data.frame", row.names = c(NA, -243L ))

2

2 Answers

2
votes

Using :

library(data.table)
setDT(DF)

DF[, Y := 0
   ][routineyear == 1
     , Y := 1 + (shift(submission, fill = 1) == 0)
     , by = country][]

which gives (first 15 rows shown):

> DF
    ccode year country submission routineyear Y
 1:    31 1990 Bahamas          1           1 1
 2:    31 1991 Bahamas          0           0 0
 3:    31 1992 Bahamas          0           0 0
 4:    31 1993 Bahamas          0           1 1
 5:    31 1994 Bahamas          0           0 0
 6:    31 1995 Bahamas          1           0 0
 7:    31 1996 Bahamas          0           0 0
 8:    31 1997 Bahamas          1           1 2
 9:    31 1998 Bahamas          0           0 0
10:    31 1999 Bahamas          1           1 1
11:    31 2000 Bahamas          0           0 0
12:    31 2001 Bahamas          1           1 1
13:    31 2002 Bahamas          0           0 0
14:    31 2003 Bahamas          1           1 1
15:    31 2004 Bahamas          0           0 0
........

What this does:

  • setDT(DF) converts your dataframe to a data.table
  • Y := 0 sets Y to 0 by reference first
  • Filter for routineyear == 1
  • Update Y by reference such that Y is set to 1 if previous submission is 1 and to 2 is previous submission is 0
1
votes
library(dplyr)

select(dat2, -Y) %>% 
  filter(routineyear == 1L) %>% 
  group_by(country) %>% 
  mutate(Y = 2L - lag(submission, default = 1L)) %>% 
  ungroup() %>% 
  right_join(select(dat2, -Y)) %>% 
  mutate(Y = replace(Y, is.na(Y), 0L))

# # A tibble: 14 x 5
#    country  year submission routineyear     Y
#    <fct>   <int>      <int>       <int> <int>
#  1 A        1990          1           1     1
#  2 A        1991          0           0     0
#  3 A        1992          1           1     1
#  4 A        1993          1           0     0
#  5 A        1994          0           0     0
#  6 A        1995          0           0     0
#  7 A        1996          0           1     1
#  8 A        1997          0           0     0
#  9 A        1998          1           1     2
# 10 A        1999          0           0     0
# 11 A        2000          0           0     0
# 12 A        2001          0           1     1
# 13 A        2002          0           0     0
# 14 A        2003          1           1     2

all.equal(.Last.value, dat2)
# [1] TRUE

where dat2 is:

dat2 <- read.table(text = 
"country year     submission routineyear  Y
A       1990          1            1     1  
A       1991          0            0     0
A       1992          1            1     1 
A       1993          1            0     0
A       1994          0            0     0
A       1995          0            0     0
A       1996          0            1     1
A       1997          0            0     0
A       1998          1            1     2
A       1999          0            0     0
A       2000          0            0     0
A       2001          0            1     1
A       2002          0            0     0
A       2003          1            1     2
", header = TRUE)