4
votes

Here is the dataset

structure(list(V1 = structure(c(2L, 3L, 4L, 5L, 3L, 4L, 5L, 4L, 
5L), .Label = c("ABC", "DEF", "GHI", "JKL", "MNO"), class = "factor"), 
V2 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L), .Label = c("ABC", 
"DEF", "GHI", "JKL", "MNO"), class = "factor"), Freq = c(3L,                                                                                                                                     
2L, 3L, 2L, 3L, 1L, 2L, 1L, 1L)), row.names = c(NA, 9L), class = "data.frame")
                                                                            

which looks like this

   V1  V2 Freq
1  DEF ABC    3
2  GHI ABC    2
3  JKL ABC    3
4  MNO ABC    2
5  GHI DEF    3
6  JKL DEF    1
7  MNO DEF    2
8  JKL GHI    1
9  MNO GHI    1

I would like to create something like this

x       y
ABC     4
DEF     4
GHI     4
JKL     3
MNO     3

Note that ABC has an association 4 times with DEF, GHI, JKL, and MNO. This corresponds to line 1,2,3 and 4. JKL has an association 3 times with ABC,DEF, and GHI. This corresponds to line 3,6,and 8.

5

5 Answers

3
votes

We can do this in a couple of ways

  1. We split, each column by the other into a list. Get the lengths, add (+) and stack it to 2 column data.frame
stack(lengths(split(as.character(df1$V2), df1$V1)) + 
       lengths(split(as.character(df1$V1), df1$V2)))[2:1]

-output

 ind values
1 ABC      4
2 DEF      4
3 GHI      4
4 JKL      3
5 MNO      3

  1. Or do the table on the columns, + them, get the rowSums, and stack it
stack( rowSums(table(df1[2:1]) + table(df1[1:2])))[2:1]
3
votes

This solution can also be used:

library(dplyr)
library(purrr)
library(tibble)

df %>%
  select(V1, V2) %>%
  unlist() %>% 
  unique() %>% 
  enframe() %>%
  mutate(freq = map_int(value, function(a) pmap_lgl(df[-3], ~ a %in% c(...)) %>% reduce(`+`))) %>%
  arrange(value) %>%
  select(-name)

# A tibble: 5 x 2
  value  freq
  <fct> <int>
1 ABC       4
2 DEF       4
3 GHI       4
4 JKL       3
5 MNO       3
3
votes

We can try table + stack

> rev(stack(table(unlist(df[-3]))))
  ind values
1 ABC      4
2 DEF      4
3 GHI      4
4 JKL      3
5 MNO      3

or an igraph option with degree like below

> rev(stack(degree(graph_from_data_frame(df))))
  ind values
1 DEF      4
2 GHI      4
3 JKL      3
4 MNO      3
5 ABC      4
1
votes

If I understand correctly, the OP wants to count the occurrences of each item in both columns V1 and V2.

This can be solved by reshaping and counting the occurrences in long format:

library(data.table)
melt(setDT(df), measure.vars = patterns("^V"))[, .N, keyby = value]
   value N
1:   ABC 4
2:   DEF 4
3:   GHI 4
4:   JKL 3
5:   MNO 3

or with dplyr & tidyr:

library(dplyr)
library(tidyr)
df %>% 
  pivot_longer(starts_with("V")) %>% 
  count(value)
# A tibble: 5 x 2
  value     n
  <fct> <int>
1 ABC       4
2 DEF       4
3 GHI       4
4 JKL       3
5 MNO       3
1
votes

library janitor also helps

janitor::tabyl(c(df$V1, df$V2))
#>  c(df$V1, df$V2) n   percent
#>              ABC 4 0.2222222
#>              DEF 4 0.2222222
#>              GHI 4 0.2222222
#>              JKL 3 0.1666667
#>              MNO 3 0.1666667

Created on 2021-07-03 by the reprex package (v2.0.0)