1
votes

I have the following dummy data:

a <- c(4,8,8,2,8,4,2,16)
b <- c(16,32,2,1,0.5,64,2,1)
c <- c(0.25,0.25,0.12,1,4,8,16,16)

df<-data.frame(a,b,c,stringsAsFactors = FALSE)

(NOTE: The data frame I want to do this on has alot of other columns as well, which isn't relevant to this specific analysis)

I want to end up with something like this:

   n  0.12  0.25   0.5   1    2   4    8    16   32   64  
a  8  0     0      0     0    25  25   37.5 12.5 0    0
b  8  0     0      12.5  25   25  0    0    12.5 12.5 12.5
c  8  12.5  25     0     12.5 0   12.5 12.5 25   0    0

Where the total amount of rows in the data frame for each corresponding column is n, and the percent distribution of each value in the next columns (the sum for each row, from column "0.12" to "64" equals 100%)

I have managed to do this for one column at a time:

library(dplyr)    

test <- df %>%
    count(a) %>%
    mutate(perc = n/sum(n)*100) %>%
    select(-n) %>%
    spread(a, perc)

However, with this, I lose the "n" column.

I have tried to use count() for all the columns I want to include, but this counts each observation in relation to the other one's being counted, resulting in a much bigger table than I am looking for. I also want to keep all columns regardless of them being 0.

2

2 Answers

2
votes

You can also try:

df %>% 
  gather() %>% 
  group_by(key) %>% 
  mutate(n1=n()) %>% 
  count(value, n1) %>% 
  mutate(perc = n/sum(n)*100) %>% 
  select(-n, n=n1) %>% 
  spread(value, perc, fill = 0) 
# A tibble: 3 x 12
# Groups:   key [3]
    key    n  `0.12` `0.25` `0.5`   `1`   `2`   `4`   `8`  `16`  `32`  `64`
* <chr> <int>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     a     8    0.0      0   0.0   0.0    25  25.0  37.5  12.5   0.0   0.0
2     b     8    0.0      0  12.5  25.0    25   0.0   0.0  12.5  12.5  12.5
3     c     8   12.5     25   0.0  12.5     0  12.5  12.5  25.0   0.0   0.0
0
votes

We could use map

library(tidyverse)
df %>%
    map(~tibble(.x) %>%
         count(.x) %>% 
         mutate(perc = n/sum(n)*100) %>% 
         select(-n) %>% 
         spread(.x, perc)) %>%
     bind_rows %>% 
     bind_cols(gather(df) %>% 
                   count(key), .) %>%
     mutate_if(is.numeric, funs(replace(., is.na(.), 0))) 
# A tibble: 3 x 12
#    key     n   `2`   `4`   `8`  `16` `0.5`   `1`  `32`  `64` `0.12` `0.25`
#  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>
#1     a     8    25  25.0  37.5  12.5   0.0   0.0   0.0   0.0    0.0      0
#2     b     8    25   0.0   0.0  12.5  12.5  25.0  12.5  12.5    0.0      0
#3     c     8     0  12.5  12.5  25.0   0.0  12.5   0.0   0.0   12.5     25