1
votes

I have a data set with ~10,740 ids together with an entry year and exit year into a program for each id, ids can have multiple entries of differing lengths of time. Their duration in the program can have been from just the one year (e.g. entry_year = 1986, exit_year = 1986) to multiple years (e.g. entry_year = 1990, exit_year = 1995).

I would like to count for each year the program has run (in the example below, from 1986 to 2004) how many ids were enrolled.

No problems when an id has been enrolled for only the one year, but I need to count an id for each year between the entry year and exit year, so for an id entry year 1990 and exit year 2005, they should be counted as enrolled for each year for 1990, 1991, 1992, 1993, 1994, 1995.

Bit stumped appreciate any suggestions.

id = c(1,1,1,3,3,3,5,5,5,5)
entry_year = c(1986, 1988, 1990, 1987, 2002, 2003,1988, 1989, 1990, 2000 )
exit_year = c(1987, 1988, 1997, 2001, 2002, 2005, 1988, 1989, 1995, 2004)
test <- data.frame(id, entry_year, exit_year)

R version 3.4.0, windows 7 x64

1

1 Answers

0
votes

Maybe like this:

years = min(entry_year):max(exit_year)
data.frame(year = years, enrolled = sapply(years, 
                function(x) {sum(test$entry_year<=x & test$exit_year>=x)}))

Output:

   year enrolled
1  1986        1
2  1987        2
3  1988        3
4  1989        2
5  1990        3
6  1991        3
7  1992        3
8  1993        3
9  1994        3
10 1995        3
11 1996        2
12 1997        2
13 1998        1
14 1999        1
15 2000        2
16 2001        2
17 2002        2
18 2003        2
19 2004        2
20 2005        1

Hope this helps!