0
votes

I have some data and from it I am creating a stacked barplot using ggplot2. For each sample on the x axis, additionally I have three factors per sample too. I would like to plot, between the barplot and the names of the samples, a different coloured square for each sample according to the factor. As there are three factors per sample I would like to plot these are three rows of squares, a little like a waffle chart. The data below shows the three columns of factors I have for each sample - "tissue_type", "biopsy_type", and "gleason_score". Is there any way I can plot all of these together? The stacked barplot and a kind of waffle chart?

data:

> total
 aberration_type Freq              sample_name tissue_type biopsy_type
1    homozygous_loss   42 160078-N_S16_L001_R1_001      Normal      Normal
2  heterozygous_loss  200 160078-N_S16_L001_R1_001      Normal      Normal
3    homozygous_loss   56 160078-T_S17_L001_R1_001      Tumour      Repeat
4  heterozygous_loss 1917 160078-T_S17_L001_R1_001      Tumour      Repeat
5               gain  666 160078-T_S17_L001_R1_001      Tumour      Repeat
6    homozygous_loss   42  160079-N_S7_L001_R1_001      Normal      Normal
7  heterozygous_loss   78  160079-N_S7_L001_R1_001      Normal      Normal
8    homozygous_loss  193  160079-T_S8_L001_R1_001      Tumour      Repeat
9  heterozygous_loss 4336  160079-T_S8_L001_R1_001      Tumour      Repeat
10              gain  129  160079-T_S8_L001_R1_001      Tumour      Repeat
11   homozygous_loss   42 160080-N_S20_L001_R1_001      Normal      Normal
12 heterozygous_loss   78 160080-N_S20_L001_R1_001      Normal      Normal
13   homozygous_loss   42 160081-N_S21_L001_R1_001      Normal      Normal
14 heterozygous_loss   76 160081-N_S21_L001_R1_001      Normal      Normal
15   homozygous_loss   42 160081-T_S22_L001_R1_001      Tumour      Repeat
16 heterozygous_loss 1191 160081-T_S22_L001_R1_001      Tumour      Repeat
17              gain   59 160081-T_S22_L001_R1_001      Tumour      Repeat
18   homozygous_loss   42 160082-N_S23_L001_R1_001      Normal      Normal
19 heterozygous_loss    6 160082-N_S23_L001_R1_001      Normal      Normal
20   homozygous_loss   42 160083-N_S24_L001_R1_001      Normal      Normal
21 heterozygous_loss    6 160083-N_S24_L001_R1_001      Normal      Normal
22   homozygous_loss   42 160083-T_S25_L001_R1_001      Tumour      Repeat
23 heterozygous_loss  515 160083-T_S25_L001_R1_001      Tumour      Repeat
24              gain   88 160083-T_S25_L001_R1_001      Tumour      Repeat
25   homozygous_loss   42 160084-N_S26_L001_R1_001      Normal      Normal
26 heterozygous_loss   79 160084-N_S26_L001_R1_001      Normal      Normal
27   homozygous_loss   42 160084-T_S27_L001_R1_001      Tumour     Initial
28 heterozygous_loss  671 160084-T_S27_L001_R1_001      Tumour     Initial
29              gain   56 160084-T_S27_L001_R1_001      Tumour     Initial
30   homozygous_loss   42  160088-N_S5_L001_R1_001      Normal      Normal
31 heterozygous_loss   63  160088-N_S5_L001_R1_001      Normal      Normal
32   homozygous_loss   42  160088-T_S6_L001_R1_001      Tumour     Initial
33 heterozygous_loss    6  160088-T_S6_L001_R1_001      Tumour     Initial
34   homozygous_loss   42 160089-N_S28_L001_R1_001      Normal      Normal
35 heterozygous_loss  114 160089-N_S28_L001_R1_001      Normal      Normal
36   homozygous_loss  113 160089-T_S29_L001_R1_001      Tumour      Repeat
37 heterozygous_loss 4196 160089-T_S29_L001_R1_001      Tumour      Repeat
38              gain    8 160089-T_S29_L001_R1_001      Tumour      Repeat
39   homozygous_loss   42 160090-N_S13_L001_R1_001      Normal      Normal
40 heterozygous_loss   75 160090-N_S13_L001_R1_001      Normal      Normal
41   homozygous_loss   42 160091-N_S14_L001_R1_001      Normal      Normal
42 heterozygous_loss   74 160091-N_S14_L001_R1_001      Normal      Normal
43   homozygous_loss   42 160091-T_S15_L001_R1_001      Tumour      Repeat
44 heterozygous_loss  194 160091-T_S15_L001_R1_001      Tumour      Repeat
45   homozygous_loss   41  160093-N_S9_L001_R1_001      Normal      Normal
46 heterozygous_loss    6  160093-N_S9_L001_R1_001      Normal      Normal
47   homozygous_loss   42 160093-T_S10_L001_R1_001      Tumour     Initial
48 heterozygous_loss 1034 160093-T_S10_L001_R1_001      Tumour     Initial
49   homozygous_loss   42 160094-N_S11_L001_R1_001      Normal      Normal
50 heterozygous_loss   77 160094-N_S11_L001_R1_001      Normal      Normal
51   homozygous_loss   42 160094-T_S12_L001_R1_001      Tumour      Repeat
52 heterozygous_loss 2192 160094-T_S12_L001_R1_001      Tumour      Repeat
53              gain   10 160094-T_S12_L001_R1_001      Tumour      Repeat
54   homozygous_loss   42  160095-N_S1_L001_R1_001      Normal      Normal
55 heterozygous_loss   76  160095-N_S1_L001_R1_001      Normal      Normal
56   homozygous_loss   41  160095-T_S2_L001_R1_001      Tumour     Initial
57 heterozygous_loss  442  160095-T_S2_L001_R1_001      Tumour     Initial
58   homozygous_loss   42  160096-N_S4_L001_R1_001      Normal      Normal
59 heterozygous_loss    6  160096-N_S4_L001_R1_001      Normal      Normal
60   homozygous_loss   42  160096-T_S4_L001_R1_001      Tumour      Repeat
61 heterozygous_loss  484  160096-T_S4_L001_R1_001      Tumour      Repeat
62   homozygous_loss   42  160098-N_S4_L001_R1_001      Normal      Normal
63 heterozygous_loss   68  160098-N_S4_L001_R1_001      Normal      Normal
64   homozygous_loss   42  160098-T_S4_L001_R1_001      Tumour     Initial
65 heterozygous_loss  598  160098-T_S4_L001_R1_001      Tumour     Initial
   gleason_score
1         Normal
2         Normal
3            3_4
4            3_4
5            3_4
6         Normal
7         Normal
8            3_3
9            3_3
10           3_3
11        Normal
12        Normal
13        Normal
14        Normal
15           3_3
16           3_3
17           3_3
18        Normal
19        Normal
20        Normal
21        Normal
22           3_3
23           3_3
24           3_3
25        Normal
26        Normal
27           3_3
28           3_3
29           3_3
30        Normal
31        Normal
32           3_3
33           3_3
34        Normal
35        Normal
36           3_4
37           3_4
38           3_4
39        Normal
40        Normal
41        Normal
42        Normal
43           3_3
44           3_3
45        Normal
46        Normal
47           3_3
48           3_3
49        Normal
50        Normal
51           3_4
52           3_4
53           3_4
54        Normal
55        Normal
56           3_3
57           3_3
58        Normal
59        Normal
60           3_4
61           3_4
62        Normal
63        Normal
64           3_3
65           3_3

how i am currently making a stacked barplot using ggplot

ggplot(data = total, aes(x = reorder(sample_name, -Freq), y = Freq, fill = aberration_type)) +
    geom_bar(stat="identity") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 5)) +
    ggtitle("Frequency aberrant bins") +
    xlab("Sample Name") +
    ylab("Frequency")
1

1 Answers

1
votes

I'm sure it's possible - but in general, faceting is a good approach to visualizing data by different factors. Here's an initial attempt. It makes the sample labels a bit crowded and the bars are less clear (at least in this small version), but does illustrate the main finding: namely, higher heterozygous loss in tumour biopsies.

ggplot(total, aes(x = reorder(sample_name, -Freq),
                  y = Freq,
                  fill = aberration_type)) +
geom_col() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, size=5)) +
labs(title = "Frequency aberrant bins",
     x = "Sample Name",
     y = "Frequency") + 
facet_grid(biopsy_type ~ tissue_type + gleason_score)

Result: enter image description here

For clearer charts you could facet using less factors e.g. just biopsy type:

+ facet_grid(biopsy_type ~ .)