Create a group number for each consecutive sequence

Question

I have the data.frame below. I want to add a column 'g' that classifies my data according to consecutive sequences in column h_no. That is, the first sequence of h_no 1, 2, 3, 4 is group 1, the second series of h_no (1 to 7) is group 2, and so on, as indicated in the last column 'g'.

h_no   h_freq    h_freqsq g
1     0.09091 0.008264628 1
2     0.00000 0.000000000 1
3     0.04545 0.002065702 1
4     0.00000 0.000000000 1  
1     0.13636 0.018594050 2
2     0.00000 0.000000000 2
3     0.00000 0.000000000 2
4     0.04545 0.002065702 2
5     0.31818 0.101238512 2
6     0.00000 0.000000000 2
7     0.50000 0.250000000 2 
1     0.13636 0.018594050 3 
2     0.09091 0.008264628 3
3     0.40909 0.167354628 3
4     0.04545 0.002065702 3

See also Create grouping variable for consecutive sequences and split vector — Henrik

Roman Luštrik Roman Luštrik · Accepted Answer · 2012-04-14T07:56:20

You can add a column to your data using various techniques. The quotes below come from the "Details" section of the relevant help text, [[.data.frame.

Data frames can be indexed in several modes. When [ and [[ are used with a single vector index (x[i] or x[[i]]), they index the data frame as if it were a list.

my.dataframe["new.col"] <- a.vector
my.dataframe[["new.col"]] <- a.vector

The data.frame method for $, treats x as a list

my.dataframe$new.col <- a.vector

When [ and [[ are used with two indices (x[i, j] and x[[i, j]]) they act like indexing a matrix

my.dataframe[ , "new.col"] <- a.vector

Since the method for data.frame assumes that if you don't specify if you're working with columns or rows, it will assume you mean columns.

For your example, this should work:

# make some fake data
your.df <- data.frame(no = c(1:4, 1:7, 1:5), h_freq = runif(16), h_freqsq = runif(16))

# find where one appears and 
from <- which(your.df$no == 1)
to <- c((from-1)[-1], nrow(your.df)) # up to which point the sequence runs

# generate a sequence (len) and based on its length, repeat a consecutive number len times
get.seq <- mapply(from, to, 1:length(from), FUN = function(x, y, z) {
            len <- length(seq(from = x[1], to = y[1]))
            return(rep(z, times = len))
         })

# when we unlist, we get a vector
your.df$group <- unlist(get.seq)
# and append it to your original data.frame. since this is
# designating a group, it makes sense to make it a factor
your.df$group <- as.factor(your.df$group)


   no     h_freq   h_freqsq group
1   1 0.40998238 0.06463876     1
2   2 0.98086928 0.33093795     1
3   3 0.28908651 0.74077119     1
4   4 0.10476768 0.56784786     1
5   1 0.75478995 0.60479945     2
6   2 0.26974011 0.95231761     2
7   3 0.53676266 0.74370154     2
8   4 0.99784066 0.37499294     2
9   5 0.89771767 0.83467805     2
10  6 0.05363139 0.32066178     2
11  7 0.71741529 0.84572717     2
12  1 0.10654430 0.32917711     3
13  2 0.41971959 0.87155514     3
14  3 0.32432646 0.65789294     3
15  4 0.77896780 0.27599187     3
16  5 0.06100008 0.55399326     3

Create a group number for each consecutive sequence

8 Answers

Working

Final result