I'm attempting to lag data to the previous day in R. However in the dataset there are multiple observations for each day. How do I achieve this?
I have looked at using dplyr to lag the variable and using an if statement to achieve this, however this would require around 8 nested if statements to ensure that all observations are lagged to the previous day.
df <- df %>% dplyr::group_by(HomeTeam) %>%
dplyr::arrange(Date) %>%
dplyr::mutate(Score = ifelse(lag(Date) != Date, lag(Score),
ifelse(lag(Date, n = 2) != lag(Date),
lag(Score,n = 2), ifelse...)))
df <- data.frame(HomeTeam = c("Wolves", "Wolves", "Wolves"), Date = c("2019-08-20", "2019-08-20", "2019-08-19")
Input Data
HomeTeam Date Score
Wolves 2019-08-20 3
Wolves 2019-08-20 1
Wolves 2019-08-19 4
Ouput Data
HomeTeam Date Score
Wolves 2019-08-20 4
Wolves 2019-08-20 4
Wolves 2019-08-19 NA
Solution
df <- data.frame(HomeTeam = c("Wolves", "Wolves",
"Wolves","Wolves","Wolves", "Man Utd", "Man Utd", "Man Utd"), Date =
c("2019-08-20", "2019-08-20", "2019-08-19", "2019-08-19", "2019-08-15",
"2019-06-01", "2019-06-01", "2019-04-01"), Score = c(3,1,2,2,4,5,6,7))
df %>% dplyr::mutate(Date = as.Date(Date)) %>%
dplyr::arrange(Date)%>%
dplyr::group_by(HomeTeam) %>%
dplyr::mutate(lagScore = lag(Score)) %>%
dplyr::arrange(Date) %>%
dplyr::group_by(Date,HomeTeam) %>%
dplyr::mutate(lagScore = lagScore[1]) %>%
dplyr::ungroup()
# A tibble: 8 x 4
# HomeTeam Date Score lagScore
# <fct> <date> <dbl> <dbl>
# Man Utd 2019-04-01 7 NA
# Man Utd 2019-06-01 5 7
# Man Utd 2019-06-01 6 7
# Wolves 2019-08-15 4 NA
# Wolves 2019-08-19 2 4
# Wolves 2019-08-19 2 4
# Wolves 2019-08-20 3 2
# Wolves 2019-08-20 1 2
"Date"class, thenDate - 1; if"POSIXt"class, thenDate - 60*60*24will almost always be right. - r2evans