I'm working with a large time series dataset. I have multiple individuals (id) that were assayed on an hourly basis (hour) over many days (dates). However, not all individuals were observed on the same dates. I would like to create a new variable (obs) that lists the dates of each individuals from 1:n, so that each hourly assay in the same day gets the same number.
I thought I could do this easily in dplyr by using the group_by(id, date) and mutate to count the length of each id's dates, but this just replicates the 'hour' variable which I don't want.
# what i have
id <- rep(c("id1", "id2"), each = 6)
date <- as.Date(rep(c("2018-3-13", "2018-3-14", "2018-4-11", "2018-4-12"), each = 3))
hour <- rep(1:3, 4)
data.have <- data.frame(id, date, hour)
# attempt 1 - just replicates 'hour' which I don't want
data.have %>%
group_by(id, date) %>%
arrange(date) %>%
mutate(obs = 1:length(date))
# what i want
obs <- rep(1:2, each =3, times = 2)
data.want <- data.frame(id, date, hour, obs)
´´´
data.have %>% group_by(id) %>% arrange(date) %>% mutate(reldate=date - date[1])
– January