R: A way to filter based on values in another table?

Question

I have two tables that have the following structures. Table 1, which I will call the Summary Table, is a list of category-values with a count:

Category	Value	Count
Cat1	Val1
Cat1	Val2
Cat1	Val3
Cat2	Val1
Cat2	Val2
Cat3	Val1
Cat3	Val2

summary <- data.frame(Category = c('Cat1', 'Cat1', 'Cat1', 'Cat2', 'Cat2', 'Cat3', 'Cat3'),
                      Value = c('Val1', 'Val2', 'Val3', 'Val1', 'Val2', 'Val1', 'Val2'),
                      Count = c(NA,NA,NA,NA,NA,NA,NA))

I want to populate this table with counts gathered from Table 2, which we will call Raw Data Table, which has the following structure:

Entity	Cat1	Cat2	Cat3
Ent1	Val1	Val1	Val2
Ent2	Val1	Val1	Val2
Ent3	Val2	Val2	Val1
Ent4	Val2	Val1	Val2
Ent5	Val3	Val1	Val2
Ent6	Val3	Val1	Val1
Ent7	Val3	Val2	Val2

rawdata <- data.frame(Entity = c('Ent1', 'Ent2', 'Ent3', 'Ent4', 'Ent5', 'Ent6', 'Ent7'),
                      Cat1 = c('Val1', 'Val1', 'Val2', 'Val2', 'Val3', 'Val3', 'Val3'),
                      Cat2 = c('Val1', 'Val1', 'Val2', 'Val1', 'Val1', 'Val1', 'Val2'),
                      Cat3 = c('Val2', 'Val2', 'Val1', 'Val2', 'Val2', 'Val1', 'Val2'))

I want to populate the "Count" column from the summary table with the appropriate counts for each category & value pair. Programmatically, what I would want to do would be to have a counter, go through the Raw Data Table and just update the count for each value. I think this would be very inefficient in R. What I thought I would do is filter for the values but because column names are not evaluated as variables, I am at a loss of how to do this.

What I have tried (and I what I think I want something like is):

library(dplyr)
summary$Count <- nrow(rawdata %>% filter(get(summary$Category) == get(summary$Value)))

This isn't working, however. How do I get the filter to take values from another table?

akrun akrun · Accepted Answer · 2021-01-13T17:23:00

We can reshape to 'long' format with pivot_longer and use count to get the frequency count

library(dplyr)
library(tidyr)
rawdata %>% 
  pivot_longer(cols = -Entity, names_to = "Category", values_to = "Value") %>% 
  count(Category, Value)

-output

# A tibble: 7 x 3
#  Category Value     n
#  <chr>    <chr> <int>
#1 Cat1     Val1      2
#2 Cat1     Val2      2
#3 Cat1     Val3      3
#4 Cat2     Val1      5
#5 Cat2     Val2      2
#6 Cat3     Val1      2
#7 Cat3     Val2      5

NOTE: pivot_longer reshapes the data from the 'wide' format to 'long' format. By specifying the cols = -Entity, it is converting the rest of the columns to 'long' format with the column name as "Category" specified by names_to and the corresponding values as "Value" (values_to)

Or using base R with table

subset(as.data.frame(table(data.frame(Category =
   names(rawdata)[-1][col(rawdata[-1])], 
        Value = unlist(rawdata[-1])))), Freq  > 0)

R: A way to filter based on values in another table?

1 Answers