I have a data set of individuals with a number of health conditions. Individuals either do (1) or do not (0) have each condition (my real data set has 14). What I want to do is summarise the data so I know how often pairs of conditions occur. Note that some individuals may have three or four of the conditions, but what I'm interested in is the pairwise co-occurence. I would then like to plot this as a heatmap.
I suspect that the solution involves the 'gather' function from tidyr, but I haven't been able to work it out. This is an example of what my input looks like and what I'd like to achieve:
Here's some data on individuals and whether or not they have conditions "a", "b" or "c":
library(tidyverse)
library(viridis)
dat <- tibble(
id = c(1:15),
a = c(1,0,0,0,1,1,1,0,1,0,0,0,1,0,1),
b = c(1,0,0,1,1,1,0,0,1,0,0,1,1,0,1),
c = c(0,0,1,1,0,1,0,1,0,1,1,0,1,1,0))
I want to summarise how often each of the conditions occur, and how often they co-occur. In this case, it's evident that conditions "a" and "b" co-occur more often than do either of these with "c", which usually occurs on its own. Below is my imagined idea of what the data will look like in a plottable format. The first column is 'variable 1', the second is 'variable 2', and the third, is the count of how often these occur together. Below that is the plot which I have in my mind.
plotdat <- tibble(
var1 = c("a", "a", "a", "b", "b", "c"),
var2 = c("a", "b", "c", "b", "c", "c"),
count = c(7, 6, 2, 8, 3, 8))
ggplot(plotdat) +
geom_tile(aes(var1, var2, fill = count)) +
scale_fill_viridis()
Perhaps this is not the right approach at all and I actually need to convert the data into a 3x3 matrix. Any possible solutions would be gratefully received!