I would like to create new rows in a data.frame for all missing years for each group (firm and type). The dataframe looks as follows:
minimal <- data.frame(firm = c("A","A","A","B","B","B","A","A","A","B","B","B"),
type = c("X","X","X","X","X","X","Y","Y","Y","Y","Y","Y"),
year = c(2000,2004,2007,2010,2008,2001,2002,2003,2007,2000,2001,2008),
value = c(1,3,7,9,9,2,3,3,7,5,9,15)
)
Dataframe:
firm type year value
A X 2000 1
A X 2004 3
A X 2007 7
B X 2010 9
B X 2008 9
B X 2001 2
A Y 2002 3
A Y 2003 3
A Y 2007 7
B Y 2000 5
B Y 2001 9
B Y 2008 15
Now, what I want to get is the following: I can see in the data that the minimum year is 2000 and the maximum is 2010. I want to add a row for each missing year for each combination of firm-type. E.g. for firm A and type X, I would like to add rows such that it looks like this:
Final output:
firm type year value
A X 2000 1
A X 2004 3
A X 2007 7
A X 2001 1
A X 2002 1
A X 2003 1
A X 2005 3
A X 2006 3
A X 2008 7
A X 2009 7
A X 2010 7
Additionally, I want to write the value from the previous year into the column 'value' for the missing row for all subsequent years until a new non-missing row appears (as seen in the final output example).
I have not yet come up with any useful code, but what I have found so far is the following which might be the right direction:
setDT(minimal)[, .SD[match(2000:2010, year)],
by = c("firm","type")]
I don't really understand the concept of setDT and .SD, but this creates at least one row for each firm type combination. However, there is not content for year.
Thanks a lot in advance!
?complete
fromtidyr
or?expand.grid
frombase R
orCJ
fromdata.table
– akrunmin2 <- expand.grid(year = min(minimal$year):max(minimal$year), firm = unique(minimal$firm), type = unique(minimal$type))
andmerge(min2,minimal, by = c("firm","type","year"), all.x = T)
. Now I only need to add the correct values to each row which I still don'tk now how to do. – lorenzbrlibrary(dplyr); library(tidyr); minimal %>% group_by(firm, type) %>% complete(year = full_seq(year, 1)) %>% fill(value)
– Sotosyear = full_seq(2000:2010,1)
. Thanks! – lorenzbr