1
votes

I'm attempting to lag data to the previous day in R. However in the dataset there are multiple observations for each day. How do I achieve this?

I have looked at using dplyr to lag the variable and using an if statement to achieve this, however this would require around 8 nested if statements to ensure that all observations are lagged to the previous day.

  df <- df %>% dplyr::group_by(HomeTeam) %>% 
 dplyr::arrange(Date) %>%
 dplyr::mutate(Score = ifelse(lag(Date) != Date, lag(Score), 
                          ifelse(lag(Date, n = 2) != lag(Date), 
                                  lag(Score,n = 2), ifelse...)))

df <- data.frame(HomeTeam = c("Wolves", "Wolves", "Wolves"), Date = c("2019-08-20", "2019-08-20", "2019-08-19")

Input Data

HomeTeam       Date           Score
Wolves         2019-08-20     3
Wolves         2019-08-20     1
Wolves         2019-08-19     4

Ouput Data

HomeTeam       Date           Score
Wolves         2019-08-20     4
Wolves         2019-08-20     4
Wolves         2019-08-19     NA

Solution

df <- data.frame(HomeTeam = c("Wolves", "Wolves",             
"Wolves","Wolves","Wolves", "Man Utd", "Man Utd", "Man Utd"), Date =         
c("2019-08-20", "2019-08-20", "2019-08-19", "2019-08-19", "2019-08-15", 
"2019-06-01", "2019-06-01", "2019-04-01"), Score = c(3,1,2,2,4,5,6,7))

df %>% dplyr::mutate(Date = as.Date(Date)) %>%
  dplyr::arrange(Date)%>% 
  dplyr::group_by(HomeTeam) %>%
  dplyr::mutate(lagScore = lag(Score)) %>%
  dplyr::arrange(Date) %>%
  dplyr::group_by(Date,HomeTeam) %>%
  dplyr::mutate(lagScore = lagScore[1]) %>%
  dplyr::ungroup()

# A tibble: 8 x 4
#  HomeTeam Date       Score lagScore
#  <fct>    <date>     <dbl>    <dbl>
# Man Utd  2019-04-01     7       NA
# Man Utd  2019-06-01     5        7
# Man Utd  2019-06-01     6        7
# Wolves   2019-08-15     4       NA
# Wolves   2019-08-19     2        4
# Wolves   2019-08-19     2        4
# Wolves   2019-08-20     3        2
# Wolves   2019-08-20     1        2               
1
Please provide example input data and expected output. - zx8754
If you have multiple occurrences, have you tried simply subtracting 1 day? If "Date" class, then Date - 1; if "POSIXt" class, then Date - 60*60*24 will almost always be right. - r2evans
One question, just to be clear: When you have more than one observation for the same date, the score variable is always the same in all of them? - Johan Rosa

1 Answers

2
votes

Assuming you are lagging to the previously occurring date (that may not be "yesterday"), then try this:

library(dplyr)
set.seed(2)
data.frame(
  id = 1:10,
  Date = Sys.Date() + sort(sample(c(0, 1, 3), size=10, replace=TRUE))
) %>%
  mutate(lagDate = lag(Date)) %>%
  group_by(Date) %>%
  mutate(lagDate = lagDate[1]) %>%
  ungroup()
# # A tibble: 10 x 3
#       id Date       lagDate   
#    <int> <date>     <date>    
#  1     1 2019-08-20 NA        
#  2     2 2019-08-20 NA        
#  3     3 2019-08-20 NA        
#  4     4 2019-08-21 2019-08-20
#  5     5 2019-08-21 2019-08-20
#  6     6 2019-08-21 2019-08-20
#  7     7 2019-08-23 2019-08-21
#  8     8 2019-08-23 2019-08-21
#  9     9 2019-08-23 2019-08-21
# 10    10 2019-08-23 2019-08-21

Seeing a little more, try this:

df <- read.table(header=TRUE, stringsAsFactors=FALSE, text="
HomeTeam       Date           Score
Wolves         2019-08-21     96
Wolves         2019-08-21     97
Wolves         2019-08-20     3
Wolves         2019-08-20     1
Wolves         2019-08-19     4")
df$Date <- as.Date(df$Date)

df %>%
  mutate(lagDate = lag(Date)) %>%
  group_by(HomeTeam, Date) %>%
  summarize(lagDate = lagDate[1], Score = Score[1]) %>%
  ungroup() %>%
  select(HomeTeam, Date = lagDate, Score) %>%
  right_join(select(df, -Score), by = c("HomeTeam", "Date"))
# # A tibble: 5 x 3
#   HomeTeam Date       Score
#   <chr>    <date>     <int>
# 1 Wolves   2019-08-21     3
# 2 Wolves   2019-08-21     3
# 3 Wolves   2019-08-20     4
# 4 Wolves   2019-08-20     4
# 5 Wolves   2019-08-19    NA