Creating distribution table for multiple columns in data frame

Question

I have the following dummy data:

a <- c(4,8,8,2,8,4,2,16)
b <- c(16,32,2,1,0.5,64,2,1)
c <- c(0.25,0.25,0.12,1,4,8,16,16)

df<-data.frame(a,b,c,stringsAsFactors = FALSE)

(NOTE: The data frame I want to do this on has alot of other columns as well, which isn't relevant to this specific analysis)

I want to end up with something like this:

   n  0.12  0.25   0.5   1    2   4    8    16   32   64  
a  8  0     0      0     0    25  25   37.5 12.5 0    0
b  8  0     0      12.5  25   25  0    0    12.5 12.5 12.5
c  8  12.5  25     0     12.5 0   12.5 12.5 25   0    0

Where the total amount of rows in the data frame for each corresponding column is n, and the percent distribution of each value in the next columns (the sum for each row, from column "0.12" to "64" equals 100%)

I have managed to do this for one column at a time:

library(dplyr)    

test <- df %>%
    count(a) %>%
    mutate(perc = n/sum(n)*100) %>%
    select(-n) %>%
    spread(a, perc)

However, with this, I lose the "n" column.

I have tried to use count() for all the columns I want to include, but this counts each observation in relation to the other one's being counted, resulting in a much bigger table than I am looking for. I also want to keep all columns regardless of them being 0.

Roman Roman · Accepted Answer · 2017-10-24T15:43:44

You can also try:

df %>% 
  gather() %>% 
  group_by(key) %>% 
  mutate(n1=n()) %>% 
  count(value, n1) %>% 
  mutate(perc = n/sum(n)*100) %>% 
  select(-n, n=n1) %>% 
  spread(value, perc, fill = 0) 
# A tibble: 3 x 12
# Groups:   key [3]
    key    n  `0.12` `0.25` `0.5`   `1`   `2`   `4`   `8`  `16`  `32`  `64`
* <chr> <int>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     a     8    0.0      0   0.0   0.0    25  25.0  37.5  12.5   0.0   0.0
2     b     8    0.0      0  12.5  25.0    25   0.0   0.0  12.5  12.5  12.5
3     c     8   12.5     25   0.0  12.5     0  12.5  12.5  25.0   0.0   0.0

Creating distribution table for multiple columns in data frame

2 Answers