1
votes

Hello I would like to create heatmap presenting cofrequency of several variables Let's see some code:

a <- c(1,1,1,1)
b <-c(1,1,1,0)
c<- c(1,1,0,0)
d <- c(1,0,0,0)

df <- cbind(a,b,c,d)
df
     a b c d
[1,] 1 1 1 1
[2,] 1 1 1 0
[3,] 1 1 0 0
[4,] 1 0 0 0

'1' represents occurence of a phenomenon '0' the phenonenon did not appear

a and b cofrequency is 75% a and c cofrequency is 50% ...

Finally, I would like to have 4x4 matrix with colnames on x and y axis and in tiles % of cofrequency a vs a = 100%, a vs. b = 75% etc.

May I ask for a little help?


Solutions from comments generate:

library(tidyr)
library(ggplot2)
a <- c(1,1,1,1)
b <-c(1,1,1,0)
c<- c(1,1,0,0)
d <- c(1,0,0,0)
df <- cbind(a,b,c,d)
calc_freq <- function(x, y) {
  mean(df[, x] == df[, y] & df[, x] == 1 & df[, y] == 1)
}
mat <- outer(colnames(df), colnames(df), Vectorize(calc_freq))
mat
dimnames(mat) <- list(colnames(df), colnames(df))
mat %>% as_tibble() %>% gather %>% ggplot() + aes(key, value) + geom_tile()

enter image description here

I would rather to have % from mat as fill and x-axis and y-axis as dinnames(mat)

1

1 Answers

2
votes

There should be a function directly doing this however, here is one base R approach using outer. We write a function which calculates ratio

calc_freq <- function(x, y) {
    mean(df[, x] == df[, y] & df[, x] == 1 & df[, y] == 1)
}

and apply it using outer

mat <- outer(colnames(df), colnames(df), Vectorize(calc_freq))
mat

#     [,1] [,2] [,3] [,4]
#[1,] 1.00 0.75 0.50 0.25
#[2,] 0.75 0.75 0.50 0.25
#[3,] 0.50 0.50 0.50 0.25
#[4,] 0.25 0.25 0.25 0.25

If you want row and column names we can use dimnames

dimnames(mat) <- list(colnames(df), colnames(df))

This calculates the ratio of occurrence of 1 in two columns at the same position.

To get the plot we can do

library(tidyverse)

data.frame(mat) %>%
    rownames_to_column() %>%
    gather(key, value, -rowname) %>%
    ggplot() + aes(rowname, key, fill = value) + 
    geom_tile()

enter image description here