1
votes

I'm relatively new to R and a complete beginner with ggplot, but I haven't managed to find an answer to the seemingly simple problem I have. Using ggplot, I would like to make a bar chart in which two of three or more graphed factor levels are stacked.

Essentially, this is the type of data I am looking at:

df <- data.frame(Answer=c("good","good","kinda good","kinda good",
  "kinda good","good","bad","good","bad"))

This provides me with a factor with three levels, two of which are very similar:

       Answer
1       good
2       good
3 kinda good
4 kinda good
5 kinda good
6       good
7        bad
8       good
9        bad

If I let ggplot go over these data for me now,

c <- ggplot(df, aes(df$Answer))
c + geom_bar()

enter image description here

I will get a bar chart with three columns. However, I would like to end up with two columns, one of which should be a stack of the two factor levels "good" and "kinda good", still visibly separated.

I am working with 100 columns of input (study on orthography), which I will need to go through manually, so I would like to make the code as easily adjustable as possible. Some of them have more than ten levels, and I would need to sort them into three columns. Therefore, in most cases my data would more likely look like this:

df <- data.frame(Answer=c("good","goood","goo0d","good",
  "I don't know","Bad","bad","baaad","really bad"))

I would consequently group this into three categories. In approximately half of the cases, I could probably still filter using pattern matching because I will be looking at the use of spaces. The other half, however, is looking at capitalization, which would get a little messy, or at least very tedious.

I have thought of two different approaches to solve this issue more efficiently:

Simply rewriting the factor levels, but this would result in a loss of information (and I would like to keep the two levels separate). I would like to keep the original levels names because I think I need them to graph the ratio within that stacked column and to label the column properly.

I could split the respective column/factor into two separate columns/factors and graph them next to each other, and thus create a "fake" third dimension. This is looking to be the most promising approach, but before I work through 100 columns of data with this - is there a more elegant approach, maybe within the ggplot2 package, where I could just point/group the level names instead of changing/reordering the data frame behind it?

Thanks!

1
ggplot(df, aes(grepl('good', Answer), fill = Answer)) + geom_bar() will get it roughly. basically it is putting the groups on the x (good or no good) and coloring by Answer - rawr
Thanks for replying so quickly, for editing, and of course for the answer! This works really well with two groups! Since you only posted your answer as a comment, I assume there could be another solution if I need to group ten levels into three columns? - Sarah
I try not to encourage ggplot :} grouping several groups into fewer is not a ggplot problem. if you want to make n-groups, grepl won't be extremely useful. if you post some data that is more like what you need (rather than just two levels), that would help - rawr
Thank you - what would you encourage instead? I added some more details on my data format. Looks a lot like I will actually have to change my data architecture rather than tell ggplot to pluck information from different places. - Sarah

1 Answers

2
votes

You can try the following for a more automated approach in grouping the answers.

We select some keywords based on your data and loop over them to see which answers may contain each keyword

groups <- c('good','bad','ugly','know')

df <- data.frame(Answer=c("good","medium good","kinda good","still good",
                          "I don't know","good","bad","good","really bad"))

idx <- sapply(groups, function(x) grepl(x, df$Answer, ignore.case = TRUE))
df$group <- rep(colnames(idx), nrow(idx))[t(idx)]
df

#         Answer group
# 1         good  good
# 2  medium good  good
# 3   kinda good  good
# 4   still good  good
# 5 I don't know  know
# 6         good  good
# 7          bad   bad
# 8         good  good
# 9   really bad   bad


library('ggplot2')
ggplot(df, aes(group, fill = Answer)) + geom_bar()

enter image description here