1
votes

I have a dataset, where I would like to duplicate rows based on specific values.

Let's say I have a sample election dataset

vote_share  county   year
0.6         A        2016
0.4         B        2016
0.2         C        2016
0.8         A        2012
0.1         B        2012
0.3         C        2012

I would like to create duplicates of the same values for the intervening years: year 2012 values for each county for also years 2013-2015; the same for 2017-2019 with 2016 values.

I'm not sure whether I should be doing this with loops or perhaps with tidyverse?

1

1 Answers

0
votes

You can create your desired rows as a new dataframe, bind it to your original dataframe (here called df), and then use tidyr::fill to fill in the missing vote shares

df_2 <- data.frame(county = rep(c("A", "B", "C"), each = 6),
                   year = rep(c(2013, 2014, 2015, 2017, 2018, 2019), 3),
                   vote_share = NA,
                   stringsAsFactors = FALSE)

df <- rbind(df, df_2)

library(tidyverse)
df_full %>% 
  arrange(county, year) %>% 
  tidyr::fill(vote_share)
   # A tibble: 24 x 3
   vote_share county  year
        <dbl> <chr>  <dbl>
 1        0.8 A       2012
 2        0.8 A       2013
 3        0.8 A       2014
 4        0.8 A       2015
 5        0.6 A       2016
 6        0.6 A       2017
 7        0.6 A       2018
 8        0.6 A       2019
 9        0.1 B       2012
10        0.1 B       2013
# ... with 14 more rows