1
votes

(# i have updated my Question, thanks to @Sammy by the way ;-) )

I want to plot Venn diagrams with the VennDiagram package. Beforehand, i need to summarize properly my data but I don't find the good command to reach my goal easily and in an automatic way.

My data frame is as follows:

str(a)

'data.frame':   401 obs. of  3 variables:
 $ quant    : int  77 55 66 55 44 11 22 33 66 588 ...
 $ condition: Factor w/ 3 levels "a","b","c": 1 1 1 1 1 1 1 1 1 1 ...
 $ name     : Factor w/ 389 levels "dukthw1","dukthw10",..: 126 150 160 171     182 193 203 213 224 127 ...

> head(a)
  quant condition       name
1    77         a    jkhsgf1
2    55         a    jkhsgf2
3    66         a    jkhsgf3
4    55         a    jkhsgf4
5    44         a    jkhsgf5
6    11         a kluhswrg10

I would like :

1/ how many "names" are in common between factors, as for example :

a and b = 10

a and c = 8

b and c = 8

a, b and c = 3

2/ which names are in common regarding the previous stats from 1/.

Here is what allows me to get the information and realize the diagram:

library(VennDiagram)
a<- read.delim("test2.txt")

aaa<-table(a$condition, a$name)
ta<-t(aaa)
assay<-as.data.frame.matrix(ta)

str(assay)
head(assay)

nrow(subset(assay, a == 1))
nrow(subset(assay, b == 1))
nrow(subset(assay, c == 1))

nrow(subset(assay, a == 1 & b == 1))
nrow(subset(assay, b == 1 & c == 1))
nrow(subset(assay, a == 1 & c == 1))

nrow(subset(assay, a == 1 & b == 1 & c == 1))


draw.triple.venn(area1 = 112, area2 = 160, area3 = 129, n12 = 10, n23 = 7, n13 = 6, 
             n123 = 4, category = c("a", "b", "c"), lty = "blank", 
             fill = c("skyblue", "pink1", "mediumorchid"))

HOWEVER,

I am still not happy because it is fastidious. This is just a small dataframe example but what if we have a huge dataframe with hundreds of conditions instead of only "a", "b" and "c" ? I could not type one by one "nrow(subset(assay, c == 1))...".

Does anyone knows the best way to summarize my data as I want and get the output directly as a table or dataframe ? Something automatized or a best command ? Same issues with the VennDiagram, I find it very fastidious if we have more than 3 groups to enter manually all the numbers in common for each group in "draw.triple.venn(area1 = 112, ...)".

I have attached a .txt file of my (small) dataframe example: https://www.dropbox.com/s/kj9hdze83xeltlz/test.txt?dl=0

Many thanks in advance if you can help !

2

2 Answers

3
votes

You may want to give nVennR a try. I developed this package for situations like this. Here is one way to achieve what (I believe) you want in your example:

library(nVennR)
tmp <- read.table(path_to_test_data)
toV.a <- subset(tmp, V2 == "a")$V3
toV.b <- subset(tmp, V2 == "b")$V3
toV.c <- subset(tmp, V2 == "c")$V3
abcVenn <- plotVenn(list(a=toV.a, b=toV.b, c=toV.c), setColors=c("skyblue", "pink1", "mediumorchid"), borderWidth = 0)

You should get this figure in the plot window:

Venn Diagram

Now abcVenn contains information about intersections. Therefore, you can ask which elements are contained into each region. For instance,

getVennRegion(abcVenn, c("a", "c"))
[1] "dukthw53" "jkhsgf28"

getVennRegion(abcVenn, c("a", "b", "c"))
[1] "jkhsgf4"  "jkhsgf44" "dukthw51"

If you install the package (it is in CRAN), you can type vignette("nVennR") for details and examples.

1
votes

Try using the table function. It's a simple command to explore your data beforehand. https://www.r-bloggers.com/r-function-of-the-day-table/

(Displaying head(data) would have been a bit more helpful)