1
votes

I'm relatively new to R and a complete beginner with ggplot, but I haven't managed to find an answer to the seemingly simple problem I have. Using ggplot, I would like to make a bar chart in which two of three or more graphed factor levels are stacked.

Essentially, this is the type of data I am looking at:

df <- data.frame(Answer=c("good","good","kinda good","kinda good",
  "kinda good","good","bad","good","bad"))

This provides me with a factor with three levels, two of which are very similar:

       Answer
1       good
2       good
3 kinda good
4 kinda good
5 kinda good
6       good
7        bad
8       good
9        bad

If I let ggplot go over these data for me now,

c <- ggplot(df, aes(df$Answer))
c + geom_bar()

enter image description here

I will get a bar chart with three columns. However, I would like to end up with two columns, one of which should be a stack of the two factor levels "good" and "kinda good", still visibly separated.

I am working with 100 columns of input (study on orthography), which I will need to go through manually, so I would like to make the code as easily adjustable as possible. Some of them have more than ten levels, and I would need to sort them into three columns. Therefore, in most cases my data would more likely look like this:

df <- data.frame(Answer=c("good","goood","goo0d","good",
  "I don't know","Bad","bad","baaad","really bad"))

I would consequently group this into three categories. In approximately half of the cases, I could probably still filter using pattern matching because I will be looking at the use of spaces. The other half, however, is looking at capitalization, which would get a little messy, or at least very tedious.

I have thought of two different approaches to solve this issue more efficiently:

Simply rewriting the factor levels, but this would result in a loss of information (and I would like to keep the two levels separate). I would like to keep the original levels names because I think I need them to graph the ratio within that stacked column and to label the column properly.

I could split the respective column/factor into two separate columns/factors and graph them next to each other, and thus create a "fake" third dimension. This is looking to be the most promising approach, but before I work through 100 columns of data with this - is there a more elegant approach, maybe within the ggplot2 package, where I could just point/group the level names instead of changing/reordering the data frame behind it?

Thanks!

1
ggplot(df, aes(grepl('good', Answer), fill = Answer)) + geom_bar() will get it roughly. basically it is putting the groups on the x (good or no good) and coloring by Answerrawr
Thanks for replying so quickly, for editing, and of course for the answer! This works really well with two groups! Since you only posted your answer as a comment, I assume there could be another solution if I need to group ten levels into three columns?Sarah
I try not to encourage ggplot :} grouping several groups into fewer is not a ggplot problem. if you want to make n-groups, grepl won't be extremely useful. if you post some data that is more like what you need (rather than just two levels), that would helprawr
Thank you - what would you encourage instead? I added some more details on my data format. Looks a lot like I will actually have to change my data architecture rather than tell ggplot to pluck information from different places.Sarah

1 Answers

2
votes

You can try the following for a more automated approach in grouping the answers.

We select some keywords based on your data and loop over them to see which answers may contain each keyword

groups <- c('good','bad','ugly','know')

df <- data.frame(Answer=c("good","medium good","kinda good","still good",
                          "I don't know","good","bad","good","really bad"))

idx <- sapply(groups, function(x) grepl(x, df$Answer, ignore.case = TRUE))
df$group <- rep(colnames(idx), nrow(idx))[t(idx)]
df

#         Answer group
# 1         good  good
# 2  medium good  good
# 3   kinda good  good
# 4   still good  good
# 5 I don't know  know
# 6         good  good
# 7          bad   bad
# 8         good  good
# 9   really bad   bad


library('ggplot2')
ggplot(df, aes(group, fill = Answer)) + geom_bar()

enter image description here