I have data where participants could have multiple points of data per day over a four day span. I am looking to recode each respective day with a value of 1-4. This could be an example subset of my data:
my.df <- read.table(text="
ID Date Variable
1 0401 9
1 0402 2
1 0403 5
1 0404 8
2 0402 1
2 0402 9
2 0403 0
2 0404 3
2 0405 2
2 0405 1", header=TRUE)
> dput(my.df)
structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L),
Date = c(401L, 402L, 403L, 404L, 402L, 402L, 403L, 404L, 405L,
405L), Variable = c(9L, 2L, 5L, 8L, 1L, 9L, 0L, 3L, 2L, 1L
)), .Names = c("ID", "Date", "Variable"), class = "data.frame",
row.names = c(NA, -10L))
This is my desired output:
ID Date Variable DateRecode
1 0401 9 1
1 0402 2 2
1 0403 5 3
1 0404 8 4
2 0402 1 1
2 0402 9 1
2 0403 0 2
2 0404 3 3
2 0405 2 4
2 0405 1 4", header=TRUE)
I figured I need to use a lag function to create the DateRecode column because there are dozens of participants in the real dataset.
I'm able to generate a lag column using dplyr:
library(dplyr)
my.df <-
my.df %>%
group_by(ID) %>%
mutate(lag.value = dplyr::lag(Date, n = 1, default = NA))
But this, of course, does not tell R to recode anything.
The logic I'm essentially after is: When grouped by ID, if the value of Date is equal to the first/lowest value of Date, then create a new column with a value of 1. For each subsequent row, if Date is the same value as the previous row, then 1, if not, then add 1.
IF statements have not worked for me for this just yet either. I have been unable to figure out a way to account for the fact that each participant has different dates than the last, so I was hoping there would be a solution using lag.
Does anyone have any suggestions on how I might go about doing this? I've been scratching my head at this for a few days now. Thanks in advance!