R Recode variables using lag

Question

I have data where participants could have multiple points of data per day over a four day span. I am looking to recode each respective day with a value of 1-4. This could be an example subset of my data:

my.df <- read.table(text="
ID Date  Variable
1  0401  9
1  0402  2
1  0403  5
1  0404  8
2  0402  1
2  0402  9
2  0403  0
2  0404  3
2  0405  2
2  0405  1", header=TRUE)

> dput(my.df)
structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), 
    Date = c(401L, 402L, 403L, 404L, 402L, 402L, 403L, 404L, 405L, 
    405L), Variable = c(9L, 2L, 5L, 8L, 1L, 9L, 0L, 3L, 2L, 1L
    )), .Names = c("ID", "Date", "Variable"), class = "data.frame", 
row.names = c(NA, -10L))

This is my desired output:

ID Date  Variable DateRecode 
1  0401  9     1
1  0402  2     2
1  0403  5     3
1  0404  8     4
2  0402  1     1
2  0402  9     1
2  0403  0     2
2  0404  3     3
2  0405  2     4
2  0405  1     4", header=TRUE)

I figured I need to use a lag function to create the DateRecode column because there are dozens of participants in the real dataset.

I'm able to generate a lag column using dplyr:

library(dplyr)
my.df <- 
  my.df %>%
  group_by(ID) %>%
  mutate(lag.value = dplyr::lag(Date, n = 1, default = NA))

But this, of course, does not tell R to recode anything.

The logic I'm essentially after is: When grouped by ID, if the value of Date is equal to the first/lowest value of Date, then create a new column with a value of 1. For each subsequent row, if Date is the same value as the previous row, then 1, if not, then add 1.

IF statements have not worked for me for this just yet either. I have been unable to figure out a way to account for the fact that each participant has different dates than the last, so I was hoping there would be a solution using lag.

Does anyone have any suggestions on how I might go about doing this? I've been scratching my head at this for a few days now. Thanks in advance!

akrun akrun · Accepted Answer · 2018-06-08T03:44:19

We can do this with match

library(dplyr)
my.df %>% 
   group_by(ID) %>% 
   mutate(lag.value = match(Date, unique(Date)))
# A tibble: 10 x 4
# Groups:   ID [2]
#      ID  Date Variable lag.value
#   <int> <int>    <int>     <int>
# 1     1   401        9         1
# 2     1   402        2         2
# 3     1   403        5         3
# 4     1   404        8         4
# 5     2   402        1         1
# 6     2   402        9         1
# 7     2   403        0         2
# 8     2   404        3         3
# 9     2   405        2         4
#10     2   405        1         4

Or use factor and coerce it to integer

my.df  %>%
  group_by(ID) %>%
  mutate(lag.value = as.integer(factor(Date)))

Or another option is group_indices

library(purrr)
my.df %>% 
  split(.$ID) %>%
  map_df(~ .x %>% mutate(lag.value = group_indices(., Date)))
#   ID Date Variable lag.value
#1   1  401        9         1
#2   1  402        2         2
#3   1  403        5         3
#4   1  404        8         4
#5   2  402        1         1
#6   2  402        9         1
#7   2  403        0         2
#8   2  404        3         3
#9   2  405        2         4
#10  2  405        1         4

NOTE: Here the 'Date' is in order. If it is not, then do an arrange, then do the group_by

my.df %>%
   arrange(ID, Date) %>%
   group_by(ID) %>%
   mutate(lag.value = match(Date, unique(Date)))

R Recode variables using lag

2 Answers