If I understand the question correctly, you want to detect when the h_no
doesn't increase and then increment the class
. (I'm going to walk through how I solved this problem, there is a self-contained function at the end.)
Working
We only care about the h_no
column for the moment, so we can extract that from the data frame:
> h_no <- data$h_no
We want to detect when h_no
doesn't go up, which we can do by working out when the difference between successive elements is either negative or zero. R provides the diff
function which gives us the vector of differences:
> d.h_no <- diff(h_no)
> d.h_no
[1] 1 1 1 -3 1 1 1 1 1 1 -6 1 1 1
Once we have that, it is a simple matter to find the ones that are non-positive:
> nonpos <- d.h_no <= 0
> nonpos
[1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
[13] FALSE FALSE
In R, TRUE
and FALSE
are basically the same as 1
and 0
, so if we get the cumulative sum of nonpos
, it will increase by 1 in (almost) the appropriate spots. The cumsum
function (which is basically the opposite of diff
) can do this.
> cumsum(nonpos)
[1] 0 0 0 1 1 1 1 1 1 1 2 2 2 2
But, there are two problems: the numbers are one too small; and, we are missing the first element (there should be four in the first class).
The first problem is simply solved: 1+cumsum(nonpos)
. And the second just requires adding a 1
to the front of the vector, since the first element is always in class 1
:
> classes <- c(1, 1 + cumsum(nonpos))
> classes
[1] 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3
Now, we can attach it back onto our data frame with cbind
(by using the class=
syntax, we can give the column the class
heading):
> data_w_classes <- cbind(data, class=classes)
And data_w_classes
now contains the result.
Final result
We can compress the lines together and wrap it all up into a function to make it easier to use:
classify <- function(data) {
cbind(data, class=c(1, 1 + cumsum(diff(data$h_no) <= 0)))
}
Or, since it makes sense for the class
to be a factor:
classify <- function(data) {
cbind(data, class=factor(c(1, 1 + cumsum(diff(data$h_no) <= 0))))
}
You use either function like:
> classified <- classify(data) # doesn't overwrite data
> data <- classify(data) # data now has the "class" column
(This method of solving this problem is good because it avoids explicit iteration, which is generally recommend for R, and avoids generating lots of intermediate vectors and list etc. And also it's kinda neat how it can be written on one line :) )