2
votes
packageVersion("dplyr")
#[1] ‘0.8.99.9002’

Please note that this question uses dplyr's new across() function. To install the latest dev version of dplyr issue the remotes::install_github("tidyverse/dplyr") command. To restore to the released version of dplyr issue the install.packages("dplyr") command. If you are reading this some point in the future and are already on dplyr 1.X+ you won't need to worry about this note.

library(tidyverse)
df <- tibble(Date = c(rep(as.Date("2020-01-01"), 3), 
                      rep(as.Date("2020-02-01"), 2)),
             Type = c("A", "A", "B", "C", "C"),
             col1 = 1:5,
             col2 = c(0, 8, 0, 3, 0),
             col3 = c(25:29),
             colX = rep(99, 5))
#> # A tibble: 5 x 6
#>   Date       Type   col1  col2  col3  colX
#>   <date>     <chr> <int> <dbl> <int> <dbl>
#> 1 2020-01-01 A         1     0    25    99
#> 2 2020-01-01 A         2     8    26    99
#> 3 2020-01-01 B         3     0    27    99
#> 4 2020-02-01 C         4     3    28    99
#> 5 2020-02-01 C         5     0    29    99

I'd like to sum columns 1 through X above row-wise, grouped by "Date" and "Type". I will always start at the third column (ie col1), but will never know the numerical value of X in colX. That's OK because I can use the length of the data frame to determine how far I need to go 'out' to capture all columns until the end of the data frame. Here's my approach:

df %>% 
  group_by(Date, Type) %>% 
  summarize(across(3:length(.)), sum())
#> Error: Problem with `summarise()` input `..1`.
#> x Can't subset columns that don't exist.
#> x Locations 5 and 6 don't exist.
#> i There are only 4 columns.
#> i Input `..1` is `across(3:length(.))`.
#> i The error occured in group 1: Date = 2020-01-01, Type = "A".
#> Run `rlang::last_error()` to see where the error occurred.

But it seems my usage of the base R length(.) function is improper. Am I using dplyr's new across() function in the right manner? How can I get the length of the data frame in the portion of the pipe where I need it? I'll never know how many columns there are to the end, nor are the actual names nearly as clean as my example data frame.

1

1 Answers

3
votes
packageVersion("dplyr")
#[1] ‘0.8.99.9002’

First, you just have a little problem with your syntax, the select statement and the function both go inside the across call.

df %>% summarize(across(3:length(.),sum))
## A tibble: 1 x 4
#   col1  col2  col3  colX
#  <int> <dbl> <int> <dbl>
#1    15    11   135   495

The following code does not work because you cannot select columns that are currently being group_by-ed on.

df %>% 
   group_by(Date, Type) %>% 
   summarize(across(3:length(.), sum))
#Error: Problem with `summarise()` input `..1`.
#x Can't subset columns that don't exist.
#x Locations 5 and 6 don't exist.
#ℹ There are only 4 columns.

This is obvious when you try the following:

df %>% 
   group_by(Date, Type) %>% 
   summarize(across(everything(), sum))
## A tibble: 3 x 6
## Groups:   Date [2]
#  Date       Type   col1  col2  col3  colX
#  <date>     <chr> <int> <dbl> <int> <dbl>
#1 2020-01-01 A         3     8    51   198
#2 2020-01-01 B         3     0    27    99
#3 2020-02-01 C         9     3    57   198

Other options include the starts_with tidy-select verb.

df %>% 
  group_by(Date, Type) %>% 
  summarize(across(starts_with("col"), sum))
## A tibble: 3 x 6
## Groups:   Date [2]
#  Date       Type   col1  col2  col3  colX
#  <date>     <chr> <int> <dbl> <int> <dbl>
#1 2020-01-01 A         3     8    51   198
#2 2020-01-01 B         3     0    27    99
#3 2020-02-01 C         9     3    57   198

The row-wise and column-wise vignettes are pretty good. The row-wise one actually discusses how group_by columns are subset.