0
votes

For presentation purposes I often have to format a data frame with column and row totals and percentages.

Piping the row totals and percentages conditionally is straight forward: stackoverflow e.g.

The column totals can be neatly piped:

option 1: stackoverflow e.g.

option 2: using the janitor package function adorn_totals (however I'd prefer to find a way without adding more packages to my workflow).

I get stuck on the next step which is to add a column % row below the column total. This row calculates the column sum (column total) as a percentage of table sum (table total).

Here I have to split my workflow to do the following:

  1. create a table total variable
  2. function to calculate the percentage of a vector
  3. calculate the column percentage row
  4. bind the column percentage row to the table

This process feels heavy handed and I am sure there is a better way; suggestions welcome.

This is what I am aiming to achieve This is what I am aiming to achieve

Once the table is generated formatting and tidying up for presentation purposes I usually do with flextable or kableExtra as a second pass.

MWE

library(tidyverse)

tib <- tibble(v1 = c("a", "b", "c"),
              v2 = 1:3,
              v3 = 4:6)

# piping row summaries and column totals
tib <- 
  tib %>% 
  mutate(r_sum = rowSums(.[2:3]),
         r_pc = r_sum * 100/sum(r_sum)) %>% 
  bind_rows(summarise_all(., funs(if(is.numeric(.)) sum(.) else "Total")))


# extract gross total
table_total <- tib$r_sum[4]

# function to calculate percentage * 2 as tib includes a column total row
calc_pc <- function(x) {sum(x) * 100 / (table_total * 2)}

# calculate column percentages
col_pc <- 
  tib %>% 
  summarise_at(vars(v1:r_sum), funs(if(is.numeric(.)) calc_pc(.) else "Column %"))

# finally bringing it all together for the desired result
tib <- 
  tib %>% 
  bind_rows(col_pc)



1

1 Answers

1
votes

Using janitor, we can do everything once we have a precalculated total.

library(janitor, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)

tib <- tibble(v1 = c("a", "b", "c"), v2 = 1:3, v3 = 4:6)

total <- tib %>% select(where(is.numeric)) %>% sum

tib %>% 
  adorn_totals(c("row", "col")) %>% 
  rowwise() %>%
  mutate("Row %" = round(sum(across(where(is.numeric)))/total*50)) %>%
  ungroup %>%
  bind_rows(summarise(., across(where(is.numeric), ~round(sum(.)/total*50)))) %>%
  `[[<-`(nrow(.), 1, value = "Column %") %>%
  `[[<-`(nrow(.), ncol(.), value = NA)
#> # A tibble: 5 x 5
#>   v1          v2    v3 Total `Row %`
#>   <chr>    <dbl> <dbl> <dbl>   <dbl>
#> 1 a            1     4     5      24
#> 2 b            2     5     7      33
#> 3 c            3     6     9      43
#> 4 Total        6    15    21     100
#> 5 Column %    29    71   100      NA

Created on 2020-05-30 by the reprex package (v0.3.0)

Or slightly longer without janitor:

library(dplyr, warn.conflicts = FALSE)

tib <- tibble(v1 = c("a", "b", "c"), v2 = 1:3, v3 = 4:6)

total <- tib %>% select(where(is.numeric)) %>% sum

tib %>% 
  rowwise() %>%
  mutate(
    Total = sum(across(where(is.numeric))),
    "Row %" = round(sum(across(where(is.numeric)))/total*50)
  ) %>%
  ungroup %>%
  bind_rows(summarise(., across(where(is.numeric), sum))) %>%
  `[[<-`(nrow(.), 1, value = "Total") %>%
  bind_rows(summarise(., across(where(is.numeric), ~round(sum(.)/total*50)))) %>%
  `[[<-`(nrow(.), 1, value = "Column %") %>%
  `[[<-`(nrow(.), ncol(.), value= NA)
#> # A tibble: 5 x 5
#>   v1          v2    v3 Total `Row %`
#>   <chr>    <dbl> <dbl> <dbl>   <dbl>
#> 1 a            1     4     5      24
#> 2 b            2     5     7      33
#> 3 c            3     6     9      43
#> 4 Total        6    15    21     100
#> 5 Column %    29    71   100      NA

Created on 2020-05-30 by the reprex package (v0.3.0)

Both can me made a bit shorter if you don't care about the row names, of course.