0
votes

I have two tables that have the following structures. Table 1, which I will call the Summary Table, is a list of category-values with a count:

Category Value Count
Cat1 Val1
Cat1 Val2
Cat1 Val3
Cat2 Val1
Cat2 Val2
Cat3 Val1
Cat3 Val2
summary <- data.frame(Category = c('Cat1', 'Cat1', 'Cat1', 'Cat2', 'Cat2', 'Cat3', 'Cat3'),
                      Value = c('Val1', 'Val2', 'Val3', 'Val1', 'Val2', 'Val1', 'Val2'),
                      Count = c(NA,NA,NA,NA,NA,NA,NA))

I want to populate this table with counts gathered from Table 2, which we will call Raw Data Table, which has the following structure:

Entity Cat1 Cat2 Cat3
Ent1 Val1 Val1 Val2
Ent2 Val1 Val1 Val2
Ent3 Val2 Val2 Val1
Ent4 Val2 Val1 Val2
Ent5 Val3 Val1 Val2
Ent6 Val3 Val1 Val1
Ent7 Val3 Val2 Val2
rawdata <- data.frame(Entity = c('Ent1', 'Ent2', 'Ent3', 'Ent4', 'Ent5', 'Ent6', 'Ent7'),
                      Cat1 = c('Val1', 'Val1', 'Val2', 'Val2', 'Val3', 'Val3', 'Val3'),
                      Cat2 = c('Val1', 'Val1', 'Val2', 'Val1', 'Val1', 'Val1', 'Val2'),
                      Cat3 = c('Val2', 'Val2', 'Val1', 'Val2', 'Val2', 'Val1', 'Val2'))

I want to populate the "Count" column from the summary table with the appropriate counts for each category & value pair. Programmatically, what I would want to do would be to have a counter, go through the Raw Data Table and just update the count for each value. I think this would be very inefficient in R. What I thought I would do is filter for the values but because column names are not evaluated as variables, I am at a loss of how to do this.

What I have tried (and I what I think I want something like is):

library(dplyr)
summary$Count <- nrow(rawdata %>% filter(get(summary$Category) == get(summary$Value)))

This isn't working, however. How do I get the filter to take values from another table?

1

1 Answers

1
votes

We can reshape to 'long' format with pivot_longer and use count to get the frequency count

library(dplyr)
library(tidyr)
rawdata %>% 
  pivot_longer(cols = -Entity, names_to = "Category", values_to = "Value") %>% 
  count(Category, Value)

-output

# A tibble: 7 x 3
#  Category Value     n
#  <chr>    <chr> <int>
#1 Cat1     Val1      2
#2 Cat1     Val2      2
#3 Cat1     Val3      3
#4 Cat2     Val1      5
#5 Cat2     Val2      2
#6 Cat3     Val1      2
#7 Cat3     Val2      5

NOTE: pivot_longer reshapes the data from the 'wide' format to 'long' format. By specifying the cols = -Entity, it is converting the rest of the columns to 'long' format with the column name as "Category" specified by names_to and the corresponding values as "Value" (values_to)


Or using base R with table

subset(as.data.frame(table(data.frame(Category =
   names(rawdata)[-1][col(rawdata[-1])], 
        Value = unlist(rawdata[-1])))), Freq  > 0)