I am trying to group a series of observations by two columns, and then create a third column with an id number. I've tried group_indices
, but that gives each combination of observations a unique number. I want the number to revert to 1
for the first observation of each group.
In my data there are a series of Sites
with a number of rows showing the calendar Day
when an observation was collected. I want to calculate the chronological day within a Site
.
library(dplyr)
# Make some data
df <- data.frame(Site = rep(c("A", "B", "C"), each = 70),
Day = as.integer(rep(c(21,22,23,24,25,26,27,1,2,3,4,5,6,7,
24,25,26,27,28,29,30), each = 10)))
# Create Day Number column (this doesn't actually work, but is the sort
# of thing I'm looking for...)
df <- df %>% group_by(Site, Day) %>%
mutate(Day.Number = group_indices(Day))
# Desired output
Site Day Day.Number
1 A 21 1
2 A 21 1
3 A 21 1
...
11 A 22 2
12 A 22 2
13 A 22 2
14 A 22 2
15 A 22 2
...
141 C 24 1
142 C 24 1
143 C 24 1
144 C 24 1
...
151 C 25 2
152 C 25 2
153 C 25 2
154 C 25 2
155 C 25 2
...
This is just a toy dataset to demonstrate the problem. Although most sites
will have ten observations of seven days
it is not always a given, so I can't just use a sequence of rep()
etc.
There is a bit of a discussion about this on github here and here but it doesn't seem to have been resolved. Any suggestions for workarounds are much appreciated.