1
votes

Following the answer to my former question I have another question raising :

How, without reshaping the data, to plot a stacked bar plot with different colour depending on another category, at the same time using stats="identity" to sum up values for each stacked area ?

The stats identity works nicely to sum up the values, but for non-stacked columns. In a stacked column, the stacking is somehow "multiplied" or "striped", see picture below.

Some data sample :

element <- rep("apples", 15)
qty <- c(2, 1, 4, 3, 6, 2, 1, 4, 3, 6, 2, 1, 4, 3, 6)
category1 <- c("Red", "Green", "Red", "Green", "Yellow")
category2 <- c("small","big","big","small","small")
d <- data.frame(element=element, qty=qty, category1=category1, category2=category2)

Which gives that table :

id  element  qty category1 category2
1   apples   2       Red     small
2   apples   1     Green       big
3   apples   4       Red       big
4   apples   3     Green     small
5   apples   6    Yellow     small
6   apples   2       Red     small
7   apples   1     Green       big
8   apples   4       Red       big
9   apples   3     Green     small
10  apples   6    Yellow     small
11  apples   2       Red     small
12  apples   1     Green       big
13  apples   4       Red       big
14  apples   3     Green     small
15  apples   6    Yellow     small

Then :
ggplot(d, aes(x=category1, y=qty, fill=category2)) + geom_bar(stat="identity")

But the graph is a bit messy: the colors aren't grouped together !

ggplot graph is striped Why is there this behaviour?

Is there still an option to correctly group the colors without reshaping my data ?

2
Why is reshaping out of the question? stat = identity will just draw what you give it. In your case, a messy dataset. You'll have to manually process the table to give you the desired result (which I don't understand what it should look like).Roman Luštrik
I try to get the lightest code to embed it in a php-coded plugin to include in a website (tikiwiki CMS) - so non-R-wise users can still customize some stats from their data. Also as there can be many plugins in one page, I want to keep the server use for showing stats low... That said reshaping is not really out of question :)Joel.O

2 Answers

2
votes

One way would be to order your data by category2. This can be done also inside ggplot() call.

ggplot(d[order(d$category2),], aes(x=category1, y=qty, fill=category2)) + 
             geom_bar(stat="identity")
1
votes

I was using for a time this solution but it happened that on my large databases (60 000 entries) the ordered stacked bars ggplot2 was drawing, depending on the zoom level, some white spaces in between the bars. Not sure where this issue comes from - but a wild guess is that I'm stacking too many bars :p .

Aggregating the data with plyr solved the problem:

element <- rep("apples", 15)
qty <- c(2, 1, 4, 3, 6, 2, 1, 4, 3, 6, 2, 1, 4, 3, 6, )
category1 <- c("Red", "Green", "Red", "Green", "Yellow")
category2 <- c("small","big","big","small","small")
d <- data.frame(element=element, qty=qty, category1=category1, category2=category2)

plyr :

d <- ddply(d, .(category1, category2), summarize, qty=sum(qty, na.rm = TRUE))

To explain briefly the contents of this formula:

ddply(1, .(2, 3), summarize, 4=function(6, na.rm = TRUE))

1: dataframe name 2, 3: columns to keep -> the grouping factors to make the calculations by summarize: to create a new dataframe (unlike transform) 4: the name of the calculated column function: the function to apply - here the sum() 6: the column on which to apply the function

4, 5, 6 can be repeated for more calculated fields...

ggplot2 : ggplot(d, aes(x=category1, y=qty, fill=category2)) + geom_bar(stat="identity")

So now, as suggested by Roman Luštrik, data is aggregated according to the graph to be shown.

After applying ddply, indeed, the data is cleaner:

  category1 category2 qty
1     Green       big   3
2     Green     small   9
3       Red       big  12
4       Red     small   6
5    Yellow     small  18

I finally understood how to manage my dataset due this really great source of information: http://jaredknowles.com/r-bootcamp https://dl.dropbox.com/u/1811289/RBootcamp/slides/Tutorial3_DataSort.html

And that one too : http://streaming.stat.iastate.edu/workshops/r-intro/lectures/6-advancedmanipulation.pdf

... Just because ?ddply is a bit... Strange (example differ from the explanation of the options) - looks that there is nothing told for the shorthand writing... But I may have missed a point...