0
votes

I have a dataset with tons of factors and I want to get the relative frequencies of each factor based on another factor. For example, let's use mtcars:

mtcars$am <- as.factor(mtcars$am)
mtcars$cyl <- as.factor(mtcars$cyl)

I want to get the frequencies where am == 1, based on the values of cyl. In this case, I should get three relative frequencies because cyl has three levels (4, 6, and 8). I have this code working:

mtcars %>%
  select(am, cyl) %>%
  table(.) %>% 
  prop.table(., 1) %>% 
  round(., digits = 2) %>% 
  data.frame() %>% 
  filter(am == 1) %>% 
  t() %>% 
  data.frame() %>% 
  slice(3)

# # A tibble: 1 x 3
#       X1     X2     X3
#   <fctr> <fctr> <fctr>
# 1   0.62   0.23   0.15

If you run it, you'll get the three frequencies above. Of course, I built this code so I know that X1 corresponds to the frequency where cyl == 4, X2 is cyl == 6, and X3 is cyl == 8.

Now, I want to do this with tons of factors (other binary factors like am). So, I want to build a custom function, bind all the frequencies later as rows, and create a nice table with these frequencies. Right now, I have this:

pull_freq <- function(mydata, var1, var2){      
 require(tidyverse)   
  var1 <- enquo(var1)
  var2 <- enquo(var2)
  mydata %>%
    select(!!var1, !!var2) %>%
    table(.) %>% 
    prop.table(., 1) %>% 
    round(., digits = 2) %>% 
    data.frame() %>% 
    filter(!!var1 == 1) %>% 
    t() %>% 
    data.frame() %>% 
    slice(3)
}

pull_freq(mtcars, am, cyl)

# A tibble: 1 x 0

But as you can see, when I run this function, I don't get any output. Any ideas of why I don't get any output? How can I get this function to work? Thank you!

3
In you example how do you know which value corresponds to which cyl level?pogibas
Right now, I don't have the names explicitly, I plan to add them later when I join all the frequencies of all the factors that I need. You can check manually by running: mtcars %>% select(am, cyl) %>% table(.) %>% prop.table(., 1) %>% round(., digits = 2) %>% data.frame() %>% filter(am == 1) %>% t() %>% data.frame() This will give you three rows with the levels of cyl and amJuan Pablo Ospina
You will always filter to var1 == 1? Seems odd to not pass that as a parameter as well.Frank
Yes, I'll always filter to var1 == 1. I only need to have these frequencies, I'm not interested in the frequencies where var1 == 0.Juan Pablo Ospina
I behaved illiterately, so I've deleted my answer. To answer your question in its comment: "Non-related question, how do you show the output of your code here?" I just copy-paste it from the console and comment it out. Nothing fancy.Nathan Werth

3 Answers

1
votes

custom function

myfun <- function(df, col1, col2, col3) {
            require(dplyr)
            require(tidyr)
            col1 <- enquo(col1)
            col2 <- enquo(col2)
            df %>% 
              count(!!col1, !!col2) %>% 
              group_by(!!col1) %>%
              mutate(tot = sum(n)) %>%
              ungroup() %>%
              group_by(!!col2) %>% 
              mutate(n = n / tot) %>%
              select(-tot) %>% 
              filter(UQ(col1)==1) %>%
              spread_(col3, "n") %>%
              round(., digits=2)
        }

Output

myfun(mtcars, am, cyl, "cyl")

# am    `4`   `6`   `8`
#  1  0.62  0.23  0.15
0
votes

Maybe I'm completely off, but is this it?

data(mtcars)

agg <- aggregate(mtcars$cyl, list(mtcars$cyl, mtcars$am), FUN = length)
names(agg) <- c("cyl", "am", "count")

agg$freq <- ave(agg$count, agg$am, FUN = function(x) x/sum(x))
agg <- t(agg[-3])
agg

Note that I have not coerced cyl and am to factors with as.factor. This is because when the data frame would be transposed, the result would be a matrix. And since matrices can only have elements of one class, all the values would become of class character. The freq values would no longer be numeric.

0
votes

How about this,

library(tidyverse)
getFreq <- function(data, group_var, value_var) {
    data %>%
        group_by_(group_var) %>%
        do({
            table(.[[value_var]]) %>%
                prop.table() %>%
                as_tibble()
        }) %>%
        spread(Var1, n)
}

getFreq(mtcars, "am", "cyl") %>% print()

You can do all filtering afterwards or just include inside the function.