2
votes

I am trying to understand the behavior of the "fill" argument in geom_polygon for ggplot.

I have a dataframe where I have multiple values from a measure of interest, obtained in different counties for each state. I have merged my database with the coordinates from the "maps" package and then I call the plot via ggplot. I don't understand how ggplot chooses what color to show for a state considering that different numbers are provided in the fill variable (mean?median?interpolation?)

Reproducing a piece of my dataframe to explain what I mean:

state=rep("Alabama",3)
counties=c("Russell","Clay","Montgomery")
long=c(-87.46201,-87.48493,-87.52503)
lat=c(30.38968,30.37249,30.33239)
group=rep(1,3)
measure=c(22,28,17)
df=data.frame(state, counties, long,lat,group,measure)

Call for ggplot

p <- ggplot()
p <- p + geom_polygon(data=df, aes(x=long, y=lat, group=group, fill=df$measure),colour="black"
) 
print(p)

Using the full dataframe, I have hundreds of rows with iterations of 17 counties and all the set of coordinates for the Alabama polygon. How is it that ggplot provides the state fill with a single color?

Again, I would assume it is somehow interpolating the fill values provided at each set of coordinate, but I am not sure about it.

Thanks everyone for the help.

1
Side note, I think you should be using numbers for longitude instead of characters, which is what is created when quotes are used.Jon Spring
Definitely, thanks. Actually I had also modified it but copied and paste the old version. EditedSfavillotto

1 Answers

1
votes

Through trial and error, it looks like the first value of the fill mapping is used for the fill of the polygon. The range of the fill scale takes all values into account. This makes sense because the documentation doesn't mention any aggregation---I agree that an aggregate function would also make sense, but I would assume that the aggregation function would be set via an argument if that were the implementation.

Instead, the documentation shows an example (and recommends) starting with two data frames, one of which has coordinates for each vertex, and one which has a single row (and single fill value) per polygon, and joining them based on an ID column.

Here's a demonstration:

long=c(1, 1, 2)
lat=c(1, 2, 2)
group=rep(1,3)
df=data.frame(long,lat,group,
              m1 = c(1, 1, 1),
              m2 = c(1, 2, 3),
              m3 = c(3, 1, 2),
              m4 = c(1, 10, 11),
              m5 = c(1, 5, 11),
              m6 = c(11, 1, 10))
library(ggplot2)
plots = lapply(paste0("m", 1:6), function(f)
  ggplot(df, aes(x = long, y = lat, group = group)) +
    geom_polygon(aes_string(fill = f)) +
    labs(title = sprintf("%s:, %s", f, toString(df[[f]])))
  )

do.call(gridExtra::grid.arrange, plots)

enter image description here