2
votes

I am trying to group a series of observations by two columns, and then create a third column with an id number. I've tried group_indices, but that gives each combination of observations a unique number. I want the number to revert to 1 for the first observation of each group.

In my data there are a series of Sites with a number of rows showing the calendar Day when an observation was collected. I want to calculate the chronological day within a Site.

library(dplyr)

# Make some data
df <- data.frame(Site = rep(c("A", "B", "C"), each = 70),
            Day = as.integer(rep(c(21,22,23,24,25,26,27,1,2,3,4,5,6,7,
                      24,25,26,27,28,29,30), each = 10)))

# Create Day Number column (this doesn't actually work, but is the sort
# of thing I'm looking for...)
df <- df %>% group_by(Site, Day) %>% 
                mutate(Day.Number = group_indices(Day))

# Desired output
     Site Day Day.Number
1      A  21     1
2      A  21     1
3      A  21     1
...
11     A  22     2
12     A  22     2
13     A  22     2
14     A  22     2
15     A  22     2
...
141    C  24     1
142    C  24     1
143    C  24     1
144    C  24     1
...
151    C  25     2
152    C  25     2
153    C  25     2
154    C  25     2
155    C  25     2
...

This is just a toy dataset to demonstrate the problem. Although most sites will have ten observations of seven days it is not always a given, so I can't just use a sequence of rep() etc.

There is a bit of a discussion about this on github here and here but it doesn't seem to have been resolved. Any suggestions for workarounds are much appreciated.

1

1 Answers

3
votes

Here's one way to do it:

df <- df %>% 
        left_join(unique(df) %>% group_by(Site) %>% mutate(Day.Number=1:n()))

head(df)
#   Site Day Day.Number
# 1    A  21          1
# 2    A  21          1
# 3    A  21          1
# 4    A  21          1
# 5    A  21          1
# 6    A  21          1