0
votes

i have a data set that resembles the table below. what i want to do is replace the NA for each ID with the available data in that respective ID apart for the outcome variable which i want to predict for. eg for ID 1, i want copy information from year 1990 to 1991, 1992, 1993. for ID 2, i should copy information from year 1992 to 1990, 1991 and 1993. The ID represents a cluster, say village. eventually i want to predict the outcome for the missing years. i want to do this in R.

   ID YeStart Author YEAR   Lat    Long Outome
     1    1990  Goroo 2012 23.45 -16.718     20
     1    1991   <NA>   NA    NA      NA     30
     1    1992   <NA>   NA    NA      NA     NA
     1    1993   <NA>   NA    NA      NA     NA
     2    1990   <NA>   NA    NA      NA      2
     2    1991   <NA>   NA    NA      NA     NA
     2    1992 Berthe 2012 20.45 -16.718     NA
     2    1993   <NA>   NA    NA      NA     NA
     3    1990   <NA>   NA    NA      NA     NA
     3    1991 Berthe 2012 40.45 -16.718     NA
     3    1992   <NA>   NA    NA      NA     NA
     3    1993   <NA>   NA    NA      NA     50
1
Hey @jonestats, you still haven't accepted the answers to your earlier questions. Take a second and figure that out. You just click the check mark next to the answer you want to accept.Matthew Plourde
let me check and do that now. i ticked a yes yesterday to indicate it was useful and i cant find that option again.jonestats

1 Answers

1
votes

I'm pretty sure the answer to this is somewhere on the site already. But you can do it with the functions merge and complete.cases.

d <- read.table(text="ID YeStart Author YEAR   Lat    Long Outome
     1    1990  Goroo 2012 23.45 -16.718     20
     1    1991   <NA>   NA    NA      NA     30
     1    1992  Goroo 2012 23.45 -16.718     NA
     1    1993   <NA>   NA    NA      NA     NA
     2    1990   <NA>   NA    NA      NA      2
     2    1991   <NA>   NA    NA      NA     NA
     2    1992 Berthe 2012 20.45 -16.718     NA
     2    1993   <NA>   NA    NA      NA     NA
     3    1990   <NA>   NA    NA      NA     NA
     3    1991 Berthe 2012 40.45 -16.718     NA
     3    1992   <NA>   NA    NA      NA     NA
     3    1993   <NA>   NA    NA      NA     50", header=TRUE)

d1 <- d[c('ID', 'YeStart', 'Outome')]
d2 <- d[! names(d) %in% c('Outome', 'YeStart')]
merge(d1, unique(d2[complete.cases(d2), ]))

#    ID YeStart Outome Author YEAR   Lat    Long
# 1   1    1990     20  Goroo 2012 23.45 -16.718
# 2   1    1991     30  Goroo 2012 23.45 -16.718
# 3   1    1992     NA  Goroo 2012 23.45 -16.718
# 4   1    1993     NA  Goroo 2012 23.45 -16.718
# 5   2    1990      2 Berthe 2012 20.45 -16.718
# 6   2    1991     NA Berthe 2012 20.45 -16.718
# 7   2    1992     NA Berthe 2012 20.45 -16.718
# 8   2    1993     NA Berthe 2012 20.45 -16.718
# 9   3    1990     NA Berthe 2012 40.45 -16.718
# 10  3    1991     NA Berthe 2012 40.45 -16.718
# 11  3    1992     NA Berthe 2012 40.45 -16.718
# 12  3    1993     50 Berthe 2012 40.45 -16.718