1
votes

This is the example data frame:

Codes <- c("70", "70", "60", "60", "60", "60", "50")

Locations <- c("a", "a", "a", "b", "b", "b", "b")

df <- data.frame(Cases, Codes, Locations) 

I want to group and summarize the codes but for each location. It has to be a function though, that works with unknown number of locations. The result should be a data frame (or two data frames (one for each location)) that shows me the number of cases for each code for each location.

I know that it is simple, if one knows the location. Just filter the data frame for each location and use "dplyr::group_by" and "dplyr::summarize". But i want it as an automatic function, where i beforehand do not know, how many different locations there are.

I tried to do it with the function dplyr::group_split, but that returns a list of tibbles on which i can't perform dplyr::group_by.

This is the expected output:

      Codes     Location A           Codes      Location B
      70            2                60            3
      60            1                50            1

Thanks for answering in advance, i struggle with this big time.

1
Can you show your expected output? Is this what you want? df %>% count(Locations) - Ronak Shah
I guess i would need 2 columns as output. One column for each location (a and b). And each column should list the number of codes they "contain". Both from highest to lowest. I think its only possible in two different data frames, otherwise its not possible to arrange them. The problem is, that there might be more than 2 locations or maybe zero. - Dutschke
If you could update your post with the output you expect it would be helpful. Perhaps, you mean df %>% group_by(Locations) %>% summarise(codes = toString(sort(Codes))) - Ronak Shah
edited it. maybe its more clear now. - Dutschke
If you add sort = TRUE in count it will be sorted. df_list <- df %>% count(Locations, Codes, sort = TRUE) %>% group_split(Locations) - Ronak Shah

1 Answers

1
votes

We could use count and then split the dataframe based on Location to get list of dataframes.

df_list <- df %>% count(Locations, Codes, sort = TRUE) %>% group_split(Locations)

#[[1]]
# A tibble: 2 x 3
#  Locations Codes     n
#  <chr>     <chr> <int>
#1 a         70        2
#2 a         60        1

#[[2]]
# A tibble: 2 x 3
#  Locations Codes     n
#  <chr>     <chr> <int>
#1 b         60        3
#2 b         50        1