2
votes

I want to use behavioural data to calculate the number of items caught. This is my example data:

df <- data.frame(id = as.factor(c(51,51,51,51,51,51,51,52,52,52,52,52,52)), 
             type = c("(K)","(K)","(K)","(K)","","","","(K)","(K)","","(K)","","(K)"))

I would like to count each of my "K"'s based on if they are consecutive or not. If consecutive, the string should count as one. if there is a gap between, they should both count as one.. so final tally will be 2.

Hope that makes sense... for the example above, I would like my final output data to look like this

id type tally
1 51  (K)     1
2 52  (K)     3

I thought aggregate might do this, however it counts the total number in a column so for 51 tally=4 rather than 1

Any help would be appreciated

Thanks Grace

3

3 Answers

4
votes

In base R, you could do it with rle. First split df by id and then for each subgroup count the number of times sequences of "(K)".

sapply(split(df, df$id), function(a)
    length(with(rle(as.character(a$type)), lengths[values == "(K)"])))
#51 52 
# 1  3 
3
votes

The rle command in base R would be useful.

temp<- tapply(df$type, df$id, function(x) rle(x == "(K)"))
df.new<- data.frame(id = names(temp), 
                tally = unlist(lapply(temp, function(x) sum(x$values))))
3
votes

We can try with rleid from data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'id', find the run-length-id of 'type', grouped by 'id', and 'type', get the length of the unique elements of 'val' that are not a blank

library(data.table)
setDT(df)[, val := rleid(type), id][type!="", .(tally = uniqueN(val)), .(id, type)]
#   id type tally
#1: 51  (K)     1
#2: 52  (K)     3

Or we can use tidyverse

library(tidyverse)
df %>%
   mutate(val = cumsum(type != lag(type, default = type[1])))  %>% 
   group_by(id) %>% 
   filter(type!="") %>% 
   summarise(type = first(type), tally= n_distinct(val))
# A tibble: 2 × 3
#      id   type tally
#   <fctr> <fctr> <int>
#1     51    (K)     1
#2     52    (K)     3