4
votes

I am graphing some data with ggplot. However, I don't understand the error I'm getting with slightly different data than data that I can graph successfully. For example, this data graphs successfully:

to_graph <- structure(list(Teacher = c("BS", "BS", "FA"
), Level = structure(c(2L, 1L, 1L), .Label = c("BE", "AE", "ME", 
"EE"), class = "factor"), Count = c(2L, 25L, 28L)), .Names = c("Teacher", 
"Level", "Count"), row.names = c(NA, 3L), class = "data.frame")

ggplot(data=to_graph, aes(x=Teacher, y=Count, fill=Level), ordered=TRUE) +
       geom_bar(aes(fill = Level), position = 'fill') +
       scale_y_continuous("",formatter="percent") +
       scale_fill_manual(values = c("#FF0000", "#FFFF00","#00CC00", "#0000FF")) +
       opts(axis.text.x=theme_text(angle=45)) + 
       opts(title = "Score Distribution")

But this does not:

to_graph <- structure(list(School = c(84351L, 84384L, 84385L, 84386L, 84387L, 
84388L, 84389L, 84397L, 84398L, 84351L, 84384L, 84385L, 84386L, 
84387L, 84388L, 84389L, 84397L, 84398L, 84351L, 84386L), Level = structure(c(2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 3L, 3L), .Label = c("BE", "AE", "ME", "EE"), class = "factor"), 
    Count = c(3L, 7L, 5L, 4L, 3L, 4L, 4L, 6L, 2L, 116L, 138L, 
    147L, 83L, 76L, 81L, 83L, 85L, 53L, 1L, 1L)), .Names = c("School", 
"Level", "Count"), row.names = c(NA, 20L), class = "data.frame")

ggplot(data=to_graph, aes(x=School, y=Count, fill=Level), ordered=TRUE) +
       geom_bar(aes(fill = Level), position = 'fill') +
       scale_y_continuous("",formatter="percent") +
       scale_fill_manual(values = c("#FF0000", "#FFFF00","#00CC00", "#0000FF")) +
       opts(axis.text.x=theme_text(angle=90)) + 
       opts(title = "Score Distribution")

With the latter code, I get this error:

stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this. Error in if (!all(data$ymin == 0)) warning("Filling not well defined when ymin != 0") : missing value where TRUE/FALSE needed

Anyone know what's going on here? Thank you!

1

1 Answers

7
votes

The error occurs because your x variable has numerical values, when in reality you want them to be discrete, i.e. use x=factor(School).

The reason for this is that stat_bin, the default stat for geom_bar, will try to summarise for each unique value of x. When your x-variable is numeric, it tries to summarise at each integer in the range. This is clearly not what you need.

ggplot(data=to_graph, aes(x=factor(School), y=Count, fill=Level), ordered=TRUE) + 
    geom_bar(aes(fill = Level), position='fill') + 
    opts(axis.text.x=theme_text(angle=90)) + 
    scale_y_continuous("",formatter="percent") + 
    opts(title = "Score Distribution") + 
    scale_fill_manual(values = c("#FF0000", "#FFFF00","#00CC00", "#0000FF"))

enter image description here