I'm relatively new to R and a complete beginner with ggplot, but I haven't managed to find an answer to the seemingly simple problem I have. Using ggplot, I would like to make a bar chart in which two of three or more graphed factor levels are stacked.
Essentially, this is the type of data I am looking at:
df <- data.frame(Answer=c("good","good","kinda good","kinda good",
"kinda good","good","bad","good","bad"))
This provides me with a factor with three levels, two of which are very similar:
Answer
1 good
2 good
3 kinda good
4 kinda good
5 kinda good
6 good
7 bad
8 good
9 bad
If I let ggplot go over these data for me now,
c <- ggplot(df, aes(df$Answer))
c + geom_bar()
I will get a bar chart with three columns. However, I would like to end up with two columns, one of which should be a stack of the two factor levels "good" and "kinda good", still visibly separated.
I am working with 100 columns of input (study on orthography), which I will need to go through manually, so I would like to make the code as easily adjustable as possible. Some of them have more than ten levels, and I would need to sort them into three columns. Therefore, in most cases my data would more likely look like this:
df <- data.frame(Answer=c("good","goood","goo0d","good",
"I don't know","Bad","bad","baaad","really bad"))
I would consequently group this into three categories. In approximately half of the cases, I could probably still filter using pattern matching because I will be looking at the use of spaces. The other half, however, is looking at capitalization, which would get a little messy, or at least very tedious.
I have thought of two different approaches to solve this issue more efficiently:
Simply rewriting the factor levels, but this would result in a loss of information (and I would like to keep the two levels separate). I would like to keep the original levels names because I think I need them to graph the ratio within that stacked column and to label the column properly.
I could split the respective column/factor into two separate columns/factors and graph them next to each other, and thus create a "fake" third dimension. This is looking to be the most promising approach, but before I work through 100 columns of data with this - is there a more elegant approach, maybe within the ggplot2 package, where I could just point/group the level names instead of changing/reordering the data frame behind it?
Thanks!
ggplot(df, aes(grepl('good', Answer), fill = Answer)) + geom_bar()
will get it roughly. basically it is putting the groups on the x (good or no good) and coloring byAnswer
– rawrgrepl
won't be extremely useful. if you post some data that is more like what you need (rather than just two levels), that would help – rawr