group and summarize a column for unknown variables

Question

This is an example for the data frame i have:

Location <- c("a", "a", "a", "b", "b", "b", "b")
Code <- c("70", "70", "60", "60", "60", "60", "50")
df <- data.frame(Location, Code)

I want to group and summarize the codes using this code:

df2 <- df %>% 
  dplyr::group_by(df$Code) %>%
  dplyr::summarize(Number = n()) %>%
  dplyr::mutate(Percent = round((Number/sum(Number)*100))) %>%
  dplyr::arrange(desc(Number))

I want this for every location though. And to add some difficutly; it is not clear, if there are different locations beforehand. The goal is to have an automatic function, that gives me the result for every location, by just using different data frames, where it is not clear, which locations are in it.

The result i am looking for is df2. df2 is for all locations though. i want a data frame, where the codes are listed and summarized by each location. so location "a" would have the codes 70 (2 times) and 60 (1 time) and the location "b" has 60 (3 times) and 50 (1 time).

I can not find a solution for this (only if i do it step by step, which i dont want)

Thanks in advance.

don't use $ in dplyr code. What is expected output that you are looking for? Do you want to add Location in group_by like dplyr::group_by(Location, Code) ? — Ronak Shah
Thanks for the response. The result i am looking for is df2. df2 is for all locations though. i want a data frame, where the codes are listed and summarized by each location. so location "a" would have the codes 70 (2 times) and 60 (1 time) and the location "b" has 60 (3 times) and 50 (1 time). — Dutschke

Ronak Shah Ronak Shah · Accepted Answer · 2020-04-30T08:14:07

We can count number of enteries for each Location and Code and calculate ratio for each Location using group_by.

library(dplyr)

df %>%
  count(Location, Code) %>%
  group_by(Location) %>%
  mutate(Percent = n/sum(n) * 100)

#  Location Code      n Percent
#  <chr>    <chr> <int>   <dbl>
#1 a        60        1    33.3
#2 a        70        2    66.7
#3 b        50        1    25  
#4 b        60        3    75

If we want data as separate column for each Location.

df %>% 
   count(Location, Code) %>% 
   tidyr::pivot_wider(names_from = Location, values_from = n, 
                       values_fill = list(n = 0))

In base R, you can use table for that.

table(df$Code, df$Location)

group and summarize a column for unknown variables

3 Answers