2
votes

I am having a problem with ggplot2 not plotting a variable (Water) on a stacked bar chart.

Here is the data:

data <- structure(list(PARK_NON = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c("apis", "indu", "miss", 
"non_apis", "non_indu", "non_miss", "non_piro", "non_sacn", "non_slbe", 
"piro", "sacn", "slbe"), class = "factor"), variable = structure(c(15L, 
9L, 3L, 10L, 12L, 7L, 6L, 11L, 8L, 2L, 5L, 1L, 14L, 4L, 13L, 
15L, 10L, 12L, 6L, 11L, 7L, 9L, 5L, 1L, 8L, 2L, 4L, 13L, 3L, 
14L), .Label = c("Agriculture", "Barren land", "Developed - High intensity", 
"Developed - Medium intensity", "Developed - Low intensity", 
"Developed - Open space", "Evergreen forest", "Deciduous forest", 
"Mixed forest", "Herbaceous", "Pasture", "Shrub", "Woody wetland", 
"Herbaceous wetland", "Water"), class = "factor"), perc_veg = c(26.0239390821837, 
0.0293350851750396, 6.90366110126389, 1.21719944965728, 1.57541802496374, 
0.394990724328702, 5.82528684342088, 4.05485247757519, 16.4441745065715, 
1.31842615202185, 9.09594225533093, 4.04411005201813, 4.73410430895216, 
7.12470716561102, 11.213852770926, 7.66680361881418, 2.1247481894809, 
1.30789300876845, 10.5308007720824, 12.723205663498, 0.713438751370985, 
0.0478161985127231, 16.4578439049856, 11.5045071302907, 13.2759304844946, 
0.0640865818499777, 10.2423233639193, 0.795200353968627, 5.43302045035016, 
7.11238152761342), in_out = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Inside", "Outside"
), class = "factor")), class = "data.frame", row.names = c(NA, 
-30L), .Names = c("PARK_NON", "variable", "perc_veg", "in_out"
))

I want data$variable stacked in a particular order, so I define the factor order...

data$variable <- factor(data$variable,
                          levels=c('Agriculture','Barren land','Developed - High intensity','Developed - Medium intensity','Developed - Low intensity',
                                   'Developed - Open space','Evergreen forest','Deciduous forest','Mixed forest','Herbaceous','Pasture','Shrub',
                                   'Woody wetland','Herbaceous wetland','Water'))

Then to plot...

library (ggplot2)
#colors to be used for NLCD classes
wa <- rgb(71/255,107/255,161/255,1)   #water
do <- rgb(222/255,202/255,202/255,1)  #developed-open
dl <- rgb(217/255,148/255,130/255,1)  #developed-low density
dm <- rgb(238/255,0/255,0/255,1)      #developed-medium density
dh <- rgb(171/255,0/255,0/255,1)      #developed-high density
bl <- rgb(179/255,174/255,163/255,1)  #barren land
df <- rgb(104/255,171/255,9/255,1)    #deciduous forest
ef <- rgb(28/255,99/255,48/255,1)     #evergreen forest
mf <- rgb(181/255,202/255,143/255,1)  #mixed forest
sh <- rgb(204/255,186/255,125/255,1)  #shrub/scrub
gr <- rgb(227/255,227/255,194/255,1)  #grassland/herbaceous
pa <- rgb(220/255,217/255,61/255,1)   #pasture/hay
ag <- rgb(171/255,112/255,40/255,1)   #cultivated crops
ww <- rgb(186/255,217/255,235/255,1)  #woody wetlands
hw <- rgb(112/255,163/255,186/255,1)  #emergent/herbaceous wetlands
#map colors defined above to particular factors (levels) w/in the data, then use this vector for the color fill
nlcd.colors <- c("Water"=wa,"Developed - Open space"=do,"Developed - Low intensity"=dl,"Developed - Medium intensity"=dm,"Developed - High intensity"=dh,
       "Barren land"=bl,"Deciduous forest"=df,"Evergreen forest"=ef,"Mixed forest"=mf,"Shrub"=sh,"Herbaceous"=gr,"Pasture"=pa,"Agriculture"=ag,
       "Woody wetland"=ww,"Herbaceous wetland"=hw)

p <- ggplot () + geom_bar(data=data,aes(x=in_out,y=perc_veg,fill=variable,order=variable,width=0.6),stat='identity')
p <- p + scale_fill_manual("Vegetation type",values=nlcd.colors,
                           labels=c('Agriculture','Barren land','Developed - High intensity','Developed - Medium intensity','Developed - Low intensity',
                                    'Developed - Open space','Evergreen forest','Deciduous forest','Mixed forest','Herbaceous','Pasture','Shrub',
                                    'Woody wetland','Herbaceous wetland','Water'))
p <- p + guides(fill=guide_legend(reverse=TRUE)) 
p <- p + ylab('Percent of location')
p <- p + theme(axis.text=element_text(color="black"), axis.ticks.x = element_blank(), axis.title.x = element_blank())
p <- p + scale_y_continuous(expand=c(0,0),limits=c(0,100)) #to remove buffer on either side of 0 and 100

Producing this graph....

enter image description here

Which has 'Water' on the legend, but not in the two bars.

Any ideas?

Thanks

-cherrytree

2
Ah, it's because ggplot needs a little extra space at the top. Omit the scale_y_continuous call.joran
@joran, add this as an answer and I'll mark it as correct.cherrytree

2 Answers

2
votes

So when you included:

scale_y_continuous(expand=c(0,0),limits=c(0,100))

ggplot doesn't quite have room to fit that last bar. This seems debatable to me, though. Setting expand to 0 and setting the limits seems like a reasonable thing to do. If you just do:

scale_y_continuous(expand=c(0,0))

you remove the padding without the truncation. That seems like an unintended result to me. I haven't pored over the source code yet, but I think setting limits is being a little too strict at the boundary.

2
votes

It is your last call to

p <- p + scale_y_continuous(expand=c(0,0),limits=c(0,100))

This is actually working on the original data rather than merely clipping the graph at 0 and 100

If you still want to remove the whitespace, maybe try

p<-p+coord_cartesian(ylim=c(0,100)).

This will simply clip the graph at the y limits provided

Also, as an aside, it's always helpful to step back through your code. I find this particularly helpful in ggplot. Strip it back to the bare minimum code needed to get a similar chart (i.e. without colour formatting and label changes etc). You'll then quickly find the offending function call. Then it is easier to target a question.