0
votes

I have a table of data which already contain several values to be plotted on a barplot with ggplot2 package (already cumulative data).

The data in the data frame "reserves" has the form (simplified):

period,amount,a1,a2,b1,b2,h1,h2,h3,h4
J,18.1,30,60,40,60,15,50,30,5
K,29,65,35,75,25,5,50,40,5
P,13.3,94,6,85,15,10,55,20,15
N,21.6,95,5,80,20,10,55,20,15

The first column (period) is the geological epoch. It will be on x axis, and I needed to have no extra ordering on it, so I prepared appropriate factor labelling with the command

reserves$period <- factor(reserves$period, levels = reserves$period)

The column "amount" is the main column to be plotted as y axis (it is percentage of hydrocarbons in each epoch, but it could be in absolute values as well, say, millions of tons or whatever). So basic plot is invoked by the command:

ggplot(reserves,aes(x=period,y=amount)) + geom_bar(stat="identity")

But here is the question. I need to plot other values, that is a1-a2, b1-b2, and h1-h4 on the same bar graph. These values are percentage values for each letter (for example, a1=60, then a2=40; the same for b1-b2; and for h1-h4 as well they sum up to 100. So: I need to have values a1-a2 as some color, proportionally dividing the "amount" bar for each value of x (stacked barplot), then I need the same for values b1-b2; so we have for each period two adjacent columns (grouped barplots), each of them is stacked. And next, I need the third column, for values h1-h4, perhaps, also as a stacked barplot, but either as a third column, or as a staggered barplot above the first one.

So the layout looks like this:

layout of a combined barplot

I learned that I need first to reshape data with package reshape2, and then use the option position="dodge" or position="fill" in geom_bar(), but here is the combination thereof. And the third barplot (for values h1-h4) seems to need "stacked percent" representation with fixed height.

Are there packages which handle the data for plotting in a more intuitive way? Lets say, we just declare, that we want variables ai,bi, hi to be plotted.

1

1 Answers

2
votes

First you should reshape your data from wide to long, then scale your proportions to their raw values. Then split your old column names (now levels of "lett") into their letters and numbers for labeling. If your real data aren't formatted like this (a1...h4) there's ways to handle that as well.

library(dplyr)
library(tidyr)
library(ggplot2)

reserves <- read.csv(text = "period,amount,a1,a2,b1,b2,h1,h2,h3,h4
J,18.1,30,60,40,60,15,50,30,5
K,29,65,35,75,25,5,50,40,5
P,13.3,94,6,85,15,10,55,20,15
N,21.6,95,5,80,20,10,55,20,15") 

reserves.tidied <- reserves %>% 
  gather(key = lett, value = prop, -period, -amount) %>% 
  mutate(rawvalue = prop * amount/100,
         lett1 = substr(lett, 1, 1),
         num = substr(lett, 2, 2)) 

reserves.tidied
   period amount lett prop rawvalue lett1 num
1       J   18.1   a1   30    5.430     a   1
2       K   29.0   a1   65   18.850     a   1
3       P   13.3   a1   94   12.502     a   1
4       N   21.6   a1   95   20.520     a   1
5       J   18.1   a2   60   10.860     a   2
6       K   29.0   a2   35   10.150     a   2
7       P   13.3   a2    6    0.798     a   2
8       N   21.6   a2    5    1.080     a   2
9       J   18.1   b1   40    7.240     b   1
10      K   29.0   b1   75   21.750     b   1
11      P   13.3   b1   85   11.305     b   1
12      N   21.6   b1   80   17.280     b   1
13      J   18.1   b2   60   10.860     b   2
14      K   29.0   b2   25    7.250     b   2
15      P   13.3   b2   15    1.995     b   2
16      N   21.6   b2   20    4.320     b   2
17      J   18.1   h1   15    2.715     h   1
18      K   29.0   h1    5    1.450     h   1
19      P   13.3   h1   10    1.330     h   1
20      N   21.6   h1   10    2.160     h   1
21      J   18.1   h2   50    9.050     h   2
22      K   29.0   h2   50   14.500     h   2
23      P   13.3   h2   55    7.315     h   2
24      N   21.6   h2   55   11.880     h   2
25      J   18.1   h3   30    5.430     h   3
26      K   29.0   h3   40   11.600     h   3
27      P   13.3   h3   20    2.660     h   3
28      N   21.6   h3   20    4.320     h   3
29      J   18.1   h4    5    0.905     h   4
30      K   29.0   h4    5    1.450     h   4
31      P   13.3   h4   15    1.995     h   4
32      N   21.6   h4   15    3.240     h   4

Then to plot your tidied data, you want the letters across the x axis, and the rawvalue we just calculated with amount*proportion on the y axis. We stack the geom_col up from 1 to 2 or 1 to 4 (the reverse=T argument overrides the default, which would have 2 or 4 at the bottom of the stack). alpha and fill let us distinguish between groups in the same bar and between bars.

Then the geom_text labels each stacked segment with the name, a newline, and the original percentage, centered on each segment. The scale reverses the default behavior again, making 1 the darkest and 2 or 4 the lightest in each bar. Then you facet across, making one group of bars for each period.

  ggplot(reserves.tidied, 
         aes(x = lett1, y = rawvalue, alpha = num, fill = lett1)) +
    geom_col(position = position_stack(reverse = T), colour = "black") +
    geom_text(position = position_stack(reverse = T, vjust = .5), 
              aes(label = paste0(lett, ":\n", prop, "%")), alpha = 1) +
    scale_alpha_discrete(range = c(1, .1)) +
    facet_grid(~period) +
    guides(fill = F, alpha = F) 

enter image description here

Rearranging it so that the "h" bars are different from the "a" and "b" bars is a bit more complex, and you'd have to think about how you want it presented, but it's totally doable.