1
votes

I have a data frame with daily values. A sample of the data looks something like this:

data<-data.frame(day=c(1:20), score=c(8,15,8,20,40,1,6,42,81,18,55,35,37,85,66,12,32,42,22,64), value=c(1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0))

The real data set comprises ~2000 rows.

I would like to be able to split the data frame into tibbles. Each tibble will consist of 10 rows. The first row of each tibble will be whenever value = 1.

Some rows will therefore be represented in more than one tibble.

Is it possible to do this using tidyverse packages?

Thanks in advance.

4

4 Answers

2
votes

Programmatically, "split into rows of 10" and "first row of each tibble ... value = 1" are two different things. I'll go with the second:

split(data, cumsum(data$value == 1))
# $`1`
#   day score value
# 1   1     8     1
# 2   2    15     0
# 3   3     8     0
# 4   4    20     0
# 5   5    40     0
# 6   6     1     0
# 7   7     6     0
# $`2`
#    day score value
# 8    8    42     1
# 9    9    81     0
# 10  10    18     0
# 11  11    55     0
# 12  12    35     0
# $`3`
#    day score value
# 13  13    37     1
# 14  14    85     0
# 15  15    66     0
# 16  16    12     0
# 17  17    32     0
# 18  18    42     0
# 19  19    22     0
# 20  20    64     0

Cuing off of Allan's alternative interpretation, similarly:

lapply(which(data$value == 1), function(i) data[i:min(nrow(data), i+9),])
# [[1]]
#    day score value
# 1    1     8     1
# 2    2    15     0
# 3    3     8     0
# 4    4    20     0
# 5    5    40     0
# 6    6     1     0
# 7    7     6     0
# 8    8    42     1
# 9    9    81     0
# 10  10    18     0
# [[2]]
#    day score value
# 8    8    42     1
# 9    9    81     0
# 10  10    18     0
# 11  11    55     0
# 12  12    35     0
# 13  13    37     1
# 14  14    85     0
# 15  15    66     0
# 16  16    12     0
# 17  17    32     0
# [[3]]
#    day score value
# 13  13    37     1
# 14  14    85     0
# 15  15    66     0
# 16  16    12     0
# 17  17    32     0
# 18  18    42     0
# 19  19    22     0
# 20  20    64     0
1
votes

If I understand correctly, you want 10 consecutive rows starting from each value of 1, whether there are further elements containing 1 or not in the next 10 rows. This is not splitting the data frame, rather it's selecting multiple overlapping subsets. This can be achieved with lapply - it doesn't require additional packages. The only issue is that you will have NA rows if you have a 1 within 10 rows of the end:

lapply(seq(sum(data$value)), function(i) data[which(data$value == 1)[i] + 0:9,])
#> [[1]]
#>    day score value
#> 1    1     8     1
#> 2    2    15     0
#> 3    3     8     0
#> 4    4    20     0
#> 5    5    40     0
#> 6    6     1     0
#> 7    7     6     0
#> 8    8    42     1
#> 9    9    81     0
#> 10  10    18     0
#> 
#> [[2]]
#>    day score value
#> 8    8    42     1
#> 9    9    81     0
#> 10  10    18     0
#> 11  11    55     0
#> 12  12    35     0
#> 13  13    37     1
#> 14  14    85     0
#> 15  15    66     0
#> 16  16    12     0
#> 17  17    32     0
#> 
#> [[3]]
#>      day score value
#> 13    13    37     1
#> 14    14    85     0
#> 15    15    66     0
#> 16    16    12     0
#> 17    17    32     0
#> 18    18    42     0
#> 19    19    22     0
#> 20    20    64     0
#> NA    NA    NA    NA
#> NA.1  NA    NA    NA
0
votes

You can try this:

library(dplyr)
library(tidyverse)
#Create empty var
data %>% mutate(index=NA) -> data
#Define values to split in by define number of rows
i <- seq(1,dim(data)[1],by=10)
j <- 1:length(i)
#Assign values
data$index[i] <- j
#Now fill
data %>% fill(index) %>% group_by(index) %>% mutate(val=1:length(index)) -> data

# A tibble: 20 x 5
# Groups:   index [2]
     day score value index   val
   <int> <dbl> <dbl> <int> <int>
 1     1     8     1     1     1
 2     2    15     0     1     2
 3     3     8     0     1     3
 4     4    20     0     1     4
 5     5    40     0     1     5
 6     6     1     0     1     6
 7     7     6     0     1     7
 8     8    42     1     1     8
 9     9    81     0     1     9
10    10    18     0     1    10
11    11    55     0     2     1
12    12    35     0     2     2
13    13    37     1     2     3
14    14    85     0     2     4
15    15    66     0     2     5
16    16    12     0     2     6
17    17    32     0     2     7
18    18    42     0     2     8
19    19    22     0     2     9
20    20    64     0     2    10
0
votes

We can also split by creating a group with gl

split(data, as.integer(gl(nrow(data), 10, nrow(data))))