0
votes

I'm trying to plot the indels length from a file created by vcftools with the feature: "--hist-indel-len". With that file, I wanted to make a plot with the insertion and deletions, if length is negative, is a deletion and if length is positive, is a insertion. The Count column will be at y-axis from 0 to the max value, and the x-axis will be the min length (-15 in that case) to the max length (15 in that case).

The data looks like:

   LENGTH  COUNT
1    -15    117
2    -14    178
3    -13    198
4    -12    414
5    -11    314
6    -10    451
7     -9    547
8     -8   1114
9     -7   1214
10    -6   2371
11    -5   3822
12    -4   9229
13    -3  17333
14    -2  20373
15    -1  19774
16     0 202129
17     1  22259
18     2  10101
19     3   4940
20     4   2458
21     5   1343
22     6    987
23     7    535
24     8    427
25     9    317
26    10    307
27    11    161
28    12    270
29    13    116
30    14    121
31    15     95

With this data.frame I'm trying to get a plot like: enter image description here

My attempt was using:

z <- read.csv("/home/userx/out.indel.hist", sep = "\t")

zz <- table(z)

barplot(zz, main="Insertion and Deletions",
         xlab="Length", ylab="Count", col=c("darkblue","red"),
         legend = rownames(zz), beside=TRUE)

Result:

enter image description here

Any help would be appreciated.

1

1 Answers

2
votes

A relatively easy solution using ggplot and the provided data would be to create a grouping variable to color by and plot using geom_col:

library(tidyverse)

create grouping variable:

dat2 %>%
  mutate(fill = ifelse(LENGTH <0, "minus", "plus")) -> dat2

ggplot(dat2)+
  geom_col(aes(x = LENGTH, y = COUNT, fill = fill))

enter image description here

the data:

structure(list(LENGTH = -15:15, COUNT = c(117L, 178L, 198L, 414L, 
314L, 451L, 547L, 1114L, 1214L, 2371L, 3822L, 9229L, 17333L, 
20373L, 19774L, 202129L, 22259L, 10101L, 4940L, 2458L, 1343L, 
987L, 535L, 427L, 317L, 307L, 161L, 270L, 116L, 121L, 95L)), .Names = c("LENGTH", 
"COUNT"), class = "data.frame", row.names = c(NA, -31L))