I've got data from a number of surveys. Each survey can be sent multiple times with updated values. For each survey/row in the dataset there's a date when the survey was submited (created). I'd like to merge the rows for each survey and keep the date from the first survey but other data from the last survey.
A simple example:
#> survey created var1 var2
#> 1 s1 2020-01-01 10 30
#> 2 s2 2020-01-02 10 90
#> 3 s2 2020-01-03 20 20
#> 4 s3 2020-01-01 45 5
#> 5 s3 2020-01-02 50 50
#> 6 s3 2020-01-03 30 10
Desired result:
#> survey created var1 var2
#> 1 s1 2020-01-01 10 30
#> 2 s2 2020-01-02 20 20
#> 3 s3 2020-01-01 30 10
Example data:
df <- data.frame(survey = c("s1", "s2", "s2", "s3", "s3", "s3"),
created = as.POSIXct(c("2020-01-01", "2020-01-02", "2020-01-03", "2020-01-01", "2020-01-02", "2020-01-03"), "%Y-%m-%d", tz = "GMT"),
var1 = c(10, 10, 20, 45, 50, 30),
var2 = c(30, 90, 20, 5, 50, 10),
stringsAsFactors=FALSE)
I've tried group_by
with summarize
in different ways but can't make it work, any help would be highly appreciated!