I think I have a pretty simple request. I have the following dataframe, where "place" is a unique identifier, while start_date and end_date may overlap. The values are unique for each ID "place".
place start_date end_date value
1 2007-09-01 2010-10-12 0.5
2 2013-09-27 2015-10-11 0.7
...
What I need is to create a year-based variable, where I expand the time series by each year (starting from first of January (i.e. 2011-01-01) starts a new row for that particular "place" and "value". I mean something like this:
place year value
1 2007 0.5
1 2008 0.5
1 2009 0.5
1 2010 0.5
2 2013 0.7
2 2014 0.7
2 2015 0.7
...
There are some cases with overlap (ie. "place"=1 & "year"=2007) for two separate cases, where one observations starts with one year and the other observation continues from that year. In that case I would prefer the "value" that ends on that specific year. So if one observation for place=1 ends with 2007 in March and another place=1 starts with 2007 in April, year=2007 value for place=1 would be marked with the previous "ending" value if that makes sense.
I've only gotten this far:
library(data.table)
data <- data.table(dat)
data[,:=
(start_date = as.Date(start_date), end_date = as.Date(end_date))]
data[,num_mons:= length(seq(from=start_date, to=end_date, by='year')),by=1:nrow(data)]
I guess writing a loop makes the most sense?
Thank you for your help and advice.