First of all, it is always advisable to give a sample of the data in a more friendly format using dput(yourData)
so that other can easily regenerate your data. Here is the output of dput()
you could better be sharing:
> dput(dat)
structure(list(site = c(1, 1, 1, 1, 1, 2, 2, 2, 2), ID = c(16,
46, 26, 89, 12, 14, 18, 19, 39), start.date = structure(c(17310,
17286, 17286, 17291, 17297, 17290, 17295, 17310, 17291), class = "Date"),
end.date = structure(c(17322, 17306, 17309, 17299, 17300,
17296, 17315, 17327, 17304), class = "Date")), class = "data.frame", row.names =
c(NA,
-9L))
To do this easily we first need to unpack the start.date
and end.date
to individual dates:
newDat <- data.frame()
for (i in 1:nrow(dat)){
expand <- data.frame(site = dat$site[i],
ID = dat$ID[i],
Dates = seq.Date(dat$start.date[i], dat$end.date[i], 1))
newDat <- rbind(newDat, expand)
}
newDat
site ID Dates
1 1 16 2017-05-24
2 1 16 2017-05-25
3 1 16 2017-05-26
4 1 16 2017-05-27
5 1 16 2017-05-28
6 1 16 2017-05-29
7 1 16 2017-05-30
. . .
. . .
Then we calculate the number of other individuals present in each site in each day:
individualCount = newDat %>%
group_by(site, Dates) %>%
summarise(individuals = n_distinct(ID) - 1)
individualCount
# A tibble: 75 x 3
# Groups: site [?]
site Dates individuals
<dbl> <date> <int>
1 1 2017-04-30 1
2 1 2017-05-01 1
3 1 2017-05-02 1
4 1 2017-05-03 1
5 1 2017-05-04 1
6 1 2017-05-05 2
7 1 2017-05-06 2
8 1 2017-05-07 2
9 1 2017-05-08 2
10 1 2017-05-09 2
# ... with 65 more rows
Then, we augment our data with the new information using left_join()
and calculate the required average:
newDat <- left_join(newDat, individualCount, by = c("site", "Dates")) %>%
group_by(site, ID) %>%
summarise(duration = max(Dates) - min(Dates)+1,
av.individuals = mean(individuals))
newDat
# A tibble: 9 x 4
# Groups: site [?]
site ID duration av.individuals
<dbl> <dbl> <time> <dbl>
1 1 12 4 0.75
2 1 16 13 0
3 1 26 24 1.42
4 1 46 21 1.62
5 1 89 9 1.33
6 2 14 7 1.14
7 2 18 21 0.875
8 2 19 18 0.333
9 2 39 14 1.14
The final step is to add the required column to the original dataset (dat
) again with left_join()
:
dat %>% left_join(newDat, by = c("site", "ID"))
dat
site ID start.date end.date duration av.individuals
1 1 16 2017-05-24 2017-06-05 13 days 0.000000
2 1 46 2017-04-30 2017-05-20 21 days 1.619048
3 1 26 2017-04-30 2017-05-23 24 days 1.416667
4 1 89 2017-05-05 2017-05-13 9 days 2.333333
5 1 12 2017-05-11 2017-05-14 4 days 2.750000
6 2 14 2017-05-04 2017-05-10 7 days 1.142857
7 2 18 2017-05-09 2017-05-29 21 days 0.857143
8 2 19 2017-05-24 2017-06-10 18 days 0.333333
9 2 39 2017-05-05 2017-05-18 14 days 1.142857