I have a large data frame that looks like this. I am pasting the dput output of my df below:
df <- structure(list(id = c(32, 32, 32, 32, 32, 32, 32, 12668031110,12668031110, 12668031110), survey_date = structure(c(17167, 17167,17167, 17167, 17167, 17167, 17167, 15034, 15034, 15034), class = "Date"),survey_year = c(2017, 2017, 2017, 2017, 2017, 2017, 2017,2011, 2011, 2011), mom_dob = c(1991, 1991, 1991, 1991, 1991,1991, 1991, 1987, 1987, 1987), date = structure(c(10592,10957, 11323, 11688, 12053, 12418, 12784, 14304, 14669, 15034), class = "Date"), date_year = c(1999, 2000, 2001, 2002,2003, 2004, 2005, 2009, 2010, 2011), mom_age = c(7, 8, 9,10, 11, 12, 13, 21, 22, 23), newborn = c(0, 0, 0, 0, 0, 0,0, 0, 0, 0), stock = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), family500_year = c(1,1, 1, 1, 1, 1, 1, 0, 0, 0), nchild1 = c(2015, 2015, 2015,2015, 2015, 2015, 2015, NA, NA, NA), nchild2 = c(NA, NA,NA, NA, NA, NA, NA, 2010, 2010, 2010), nchild3 = c(NA_real_,NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,NA_real_, NA_real_, NA_real_), nchild4 = c(NA_real_, NA_real_,NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,NA_real_, NA_real_), nchild5 = c(NA_real_, NA_real_, NA_real_,NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,NA_real_), nchild6 = c(NA_real_, NA_real_, NA_real_, NA_real_,NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), nchild7 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), nchild8 = c(NA,NA, NA, NA, NA, NA, NA, NA, NA, NA), nchild9 = c(NA, NA,NA, NA, NA, NA, NA, NA, NA, NA), nchild10 = c(NA, NA, NA,NA, NA, NA, NA, NA, NA, NA), educcat = c(2, 2, 2, 2, 2, 2,2, 3, 3, 3), educcat_college = c(0, 0, 0, 0, 0, 0, 0, 1,1, 1), hh_income_net = c(3410, 3410, 3410, 3410, 3410, 3410,3410, 7978.7001953125, 7978.7001953125, 7978.7001953125),hh_income_annual_usd = c(10912, 10912, 10912, 10912, 10912,10912, 10912, 25531.840625, 25531.840625, 25531.840625),hh_income_annual_log = c(9.29761838008324, 9.29761838008324,9.29761838008324, 9.29761838008324, 9.29761838008324, 9.29761838008324,9.29761838008324, 10.1476816041898, 10.1476816041898, 10.1476816041898), marital_stat = c(10, 10, 10, 10, 10, 10, 10, 20, 20, 20), maritalcat = c(0, 0, 0, 0, 0, 0, 0, 1, 1, 1), rural = c(1,1, 1, 1, 1, 1, 1, 0, 0, 0), age_sq = c(625, 625, 625, 625,625, 625, 625, 529, 529, 529), emp_stat = c(3, 3, 3, 3, 3,3, 3, 6, 6, 6), occupation = c("98", "98", "98", "98", "98","98", "98", "48", "48", "48"), disability_stat = c(2, 2,2, 2, 2, 2, 2, 2, 2, 2), weight = c(1039, 1039, 1039, 1039,1039, 1039, 1039, 1457, 1457, 1457), region = c(2, 2, 2,2, 2, 2, 2, 12, 12, 12), birth_country = c(1, 1, 1, 1, 1,1, 1, 1, 1, 1), birth_citizenship = c(1, 1, 1, 1, 1, 1, 1,1, 1, 1)), row.names = c(NA, -10L), class = c("tbl_df", "tbl","data.frame"))
It's a df containing the fertility history of each woman. There are 18 rows for each woman because I'm reconstructing her births from the past 18 years.
Now, what I'm trying to do is to generate a dummy variable for newborn, where the value of nchild1, nchild2, nchild3, etc matches date_year. For example, for woman id #62, if nchild1 is 2004, I want newborn to be 1 in the same row as date_year=2004 AND 1 in the same row as date_year=2005 (and the rest to be 0s). Relatedly, I want stock (i.e. the number of children) to turn to 1 the year the first child is born (in this case, 2004) and remain 1 until the last year observed (2017).
## example df of what I have now:
id date_year newborn stock nchild1 nchild2 nchchild3
62 1996 0 2 2004 2005 NA
62 1997 0 2 2004 2005 NA
62 1998 0 2 2004 2005 NA
62 1999 0 2 2004 2005 NA
62 2000 0 2 2004 2005 NA
62 2001 0 2 2004 2005 NA
62 2002 0 2 2004 2005 NA
62 2003 0 2 2004 2005 NA
62 2004 0 2 2004 2005 NA
62 2005 0 2 2004 2005 NA
62 2006 0 2 2004 2005 NA
62 2007 0 2 2004 2005 NA
62 2008 0 2 2004 2005 NA
62 2009 0 2 2004 2005 NA
62 2010 0 2 2004 2005 NA
62 2011 0 2 2004 2005 NA
62 2012 0 2 2004 2005 NA
62 2013 0 2 2004 2005 NA
62 2014 0 2 2004 2005 NA
## desired df
id date_year newborn stock nchild1 nchild2 nchchild3
62 1996 0 0 2004 2005 NA
62 1997 0 0 2004 2005 NA
62 1998 0 0 2004 2005 NA
62 1999 0 0 2004 2005 NA
62 2000 0 0 2004 2005 NA
62 2001 0 0 2004 2005 NA
62 2002 0 0 2004 2005 NA
62 2003 0 0 2004 2005 NA
62 2004 1 1 2004 2005 NA
62 2005 1 2 2004 2005 NA
62 2006 0 2 2004 2005 NA
62 2007 0 2 2004 2005 NA
62 2008 0 2 2004 2005 NA
62 2009 0 2 2004 2005 NA
62 2010 0 2 2004 2005 NA
62 2011 0 2 2004 2005 NA
62 2012 0 2 2004 2005 NA
62 2013 0 2 2004 2005 NA
62 2014 0 2 2004 2005 NA
So far I've tried to write code using ifelse, but I'm having trouble figuring out how to denote the condition that date_year==nchild1 | nchild2 | nchild3, etc.
df$newborn <- ifelse(df$newborn == 1, df$nchild1==df$date_year, df$newborn)
Any and all help would be appreciated, thanks so much.
Edited to make sure the dput matches the actual df.
UPDATED SOLUTION:
df <- df %>%
rowwise() %>%
mutate(newborn = ifelse(date_year %in% c(nchild1, nchild2, nchild3, nchild4, nchild5, nchild6, nchild7, nchild8, nchild9, nchild10), 1, 0)) %>%
group_by(id) %>%
mutate(stock = cumsum(newborn))