I have data on women who married and sometimes changed surnames over the period 1990-1999. However, I do not always know the exact year the name change took place, only that the surname changed sometime between year x and year y. In the original data, the old surname has only been crossed over and the new surname has been written next to it, which is indicated in the column "crossed_over". For example, Sarah Smith changed her name to Sarah Draper sometime in the period 1994-1999.
What I would like is that each woman have a unique surname for each year, like Liza Moore who changed her name to Liza Neville, preferably taking an average value when assigning a surname, using the column "crossed_over". For example, Sarah Smith would become Sarah Draper in 1997 and Mary King would become Mary Fisher in 1997 or 1998.
Does anyone have a suggestion to how I can achieve this using the example below?
library(tidyverse)
id <- rep(1:4, each = 10)
year <- rep(1990:1999, 4)
first_name <- c(rep("molly", 10), rep("sarah", 10), rep("mary", 10), rep("liza", 10))
last_name <- c(rep("johnson", 10), rep("smith", 4), rep("smith draper", 6), rep("king", 5), rep("king fisher", 5),
rep("moore", 7), rep("neville", 3))
crossed_over <- c(rep(NA, 10), rep(NA, 4), rep("smith", 6), rep(NA, 5), rep("king", 5), rep(NA, 10))
df <- tibble(id, year, first_name, last_name, crossed_over)