Add rows with missing years by group

Question

I would like to create new rows in a data.frame for all missing years for each group (firm and type). The dataframe looks as follows:

minimal <- data.frame(firm = c("A","A","A","B","B","B","A","A","A","B","B","B"),
                  type = c("X","X","X","X","X","X","Y","Y","Y","Y","Y","Y"),
                  year = c(2000,2004,2007,2010,2008,2001,2002,2003,2007,2000,2001,2008),
                  value = c(1,3,7,9,9,2,3,3,7,5,9,15)
                  )

Dataframe:

firm type year value
A    X    2000     1
A    X    2004     3
A    X    2007     7
B    X    2010     9
B    X    2008     9
B    X    2001     2
A    Y    2002     3
A    Y    2003     3
A    Y    2007     7
B    Y    2000     5
B    Y    2001     9
B    Y    2008    15

Now, what I want to get is the following: I can see in the data that the minimum year is 2000 and the maximum is 2010. I want to add a row for each missing year for each combination of firm-type. E.g. for firm A and type X, I would like to add rows such that it looks like this:

Final output:

firm type year value
A    X    2000     1
A    X    2004     3
A    X    2007     7
A    X    2001     1
A    X    2002     1
A    X    2003     1
A    X    2005     3
A    X    2006     3
A    X    2008     7
A    X    2009     7
A    X    2010     7

Additionally, I want to write the value from the previous year into the column 'value' for the missing row for all subsequent years until a new non-missing row appears (as seen in the final output example).

I have not yet come up with any useful code, but what I have found so far is the following which might be the right direction:

setDT(minimal)[, .SD[match(2000:2010, year)],
                           by = c("firm","type")]

I don't really understand the concept of setDT and .SD, but this creates at least one row for each firm type combination. However, there is not content for year.

Thanks a lot in advance!

I think there are dupes for this. Check for ?complete from tidyr or ?expand.grid from base R or CJ from data.table — akrun
Okay I came up with min2 <- expand.grid(year = min(minimal$year):max(minimal$year), firm = unique(minimal$firm), type = unique(minimal$type)) and merge(min2,minimal, by = c("firm","type","year"), all.x = T). Now I only need to add the correct values to each row which I still don'tk now how to do. — lorenzbr
try this: library(dplyr); library(tidyr); minimal %>% group_by(firm, type) %>% complete(year = full_seq(year, 1)) %>% fill(value) — Sotos
Cool this is really nice code. However, I still have the problem that it takes the minimum and maximum year for each group(firm,type). I actually need the total minimum and maximum which often differs from the group min and max. — lorenzbr

Sotos Sotos · Accepted Answer · 2017-05-05T09:25:11

I could not find an exact dupe for this so here is a possible solution,

library(dplyr)
library(tidyr)

minimal %>% 
  group_by(firm, type) %>% 
  complete(year = full_seq(2000:2010, 1)) %>% 
  fill(value)

Add rows with missing years by group

3 Answers