I would like to identify in which year individuals in a panel data set are observed and register the information in another variable. Individuals may be observed over more successive years OR with gaps over one or more years whereupon consecutive yearly observations may follow.
ID 1 in the df below, for instance, is observed in 2000 and 2001, while ID 2 is observed in 2000 and 2002, with a gap in 2001.
df = data.table(Year = c(2000, 2000, 2001, 2001, 2002, 2002), ID = c(1,2, 1,3,2,3 ), V1 = rep("", 6))
df
Year | ID | V1
2000 | 1 |
2000 | 2 |
2001 | 1 |
2001 | 3 |
2002 | 2 |
2002 | 3 |
My wished outpout in V1 then contains for each ID a chain of the observed years:
Year | ID | V1
2000 | 1 | 00/01
2000 | 2 | 00/02
2001 | 1 | 00/01
2001 | 3 | 01/02
2002 | 2 | 00/02
2002 | 3 | 01/02
Or better, as the information is not important for each single observations of the ID: the information of observed years only for the first observation of each ID.
Year | ID | V1
2000 | 1 | 00/01
2000 | 2 | 00/02
2001 | 1 |
2001 | 3 | 01/02
2002 | 2 |
2002 | 3 |
Thanks for any hint!
split(df$Year, df$ID)
. – lmolengths(split(df$Year, df$ID))
orsapply(split(df$Year, df$ID), length)
provides observation counts for each ID as a named vector, where the names are the ID values. – lmo