5
votes

I'd like to use stat_density2D function with categorical variables but restraining my plot to high density areas, in order to reduce overlapping and increase legibility.

Let's take an example with the following data:

plot_data <-
  data.frame(X = c(rnorm(300, 3, 2.5), rnorm(150, 7, 2)),
             Y = c(rnorm(300, 6, 2.5), rnorm(150, 2, 2)),
             Label = c(rep('A', 300), rep('B', 150)))

ggplot(plot_data, aes(X, Y, colour = Label)) + geom_point()

enter image description here

With a 2D-density plot we obtain overlapping densities

ggplot(plot_data, aes(X, Y)) + 
  stat_density_2d(geom = "polygon", aes(alpha = ..level.., fill = Label))

2D-density plot

Would it be possible to plot only high density areas (for instance level>0.03) ? The only solution I found is to "cheat" and manually modify the ..levels.. variable, either with step function or any power transformation, like in this simple example.

ggplot(plot_data, aes(X, Y)) + 
  stat_density_2d(geom = "polygon", aes(alpha = (..level..) ^ 2, fill = Label)) + 
  scale_alpha_continuous(range = c(0, 1))

2D-density plot with squared levels

Instead of modifying ..levels.. variable, is it possible to ask ggplot2/stat_density2D function to focus only on a certain range of density levels? I've tried to play with range or limits arguments of scale_alpha_continuous function without any relevant result...

Thanks!

2
You can use limits in scale_alpha_continuous, and set a lower and upper bound. Everything outside will simply be ignored (by default).Axeman
Please make sure that the viewer can understand that the plotted areas do not encompass all data from that group.Axeman
Thanks for your answer but I am not sure that it'll be enough. I agree it would be logical but as far as I tried it does not work, see ggplot(plot_data, aes(X,Y))+stat_density_2d(geom="polygon", aes(alpha=..level.., fill=Label)) + scale_alpha_continuous(limits=c(0.1,0.04)). By the way you are right, this kind of modification must be explicitly stated and explainedJonas

2 Answers

4
votes

Option 1
By adding to stat_density_2d the argument bins you definitely avoid overplotting, control and draw the attention to a number of density areas in a very economical fashion.

set.seed(123)
plot_data <-
  data.frame(
    X = c(rnorm(300, 3, 2.5), rnorm(150, 7, 2)),
    Y = c(rnorm(300, 6, 2.5), rnorm(150, 2, 2)),
    Label = c(rep('A', 300), rep('B', 150))
  )
ggplot(plot_data, aes(X, Y, group = Label)) +
  stat_density_2d(geom = "polygon",
                  aes(alpha = ..level.., fill = Label),
                  bins = 4) 

enter image description here

Option 2
Assigning manually the colours, NA for those levels we do not want to plot. Main disadvantage, we should know the number of levels and colours needed in advance (or compute them). In my example with set.seed(123)we need 7.

ggplot(plot_data, aes(X, Y, group = Label)) +
  stat_density_2d(geom = "polygon", aes(fill = as.factor(..level..))) +
  scale_fill_manual(values = c(NA, NA, NA,"#BDD7E7", "#6BAED6", "#3182BD", "#08519C"))

enter image description here

3
votes

You have to generate the 2d kernel density manually and them plot the result. This way you can chose the values on each point as for example avoid overlap. Here is the code:

plot_data <-
  data.frame(X = c(rnorm(300, 3, 2.5), rnorm(150, 7, 2)),
             Y = c(rnorm(300, 6, 2.5), rnorm(150, 2, 2)),
             Label = c(rep('A', 300), rep('B', 150)))


library(ggplot2)
library(MASS)
library(tidyr)
#Calculate the range
xlim <- range(plot_data$X)
ylim <-range(plot_data$Y)


#Genrate the kernel density for each group
newplot_data <- plot_data %>% group_by(Label) %>% do(Dens=kde2d(.$X, .$Y, n=100, lims=c(xlim,ylim)))

#Transform the density in  data.frame
newplot_data  %<>%  do(Label=.$Label, V=expand.grid(.$Dens$x,.$Dens$y), Value=c(.$Dens$z)) %>% do(data.frame(Label=.$Label,x=.$V$Var1, y=.$V$Var2, Value=.$Value))

#Untidy data and chose the value for each point.
#In this case chose the value of the label with highest value  
   newplot_data  %<>%   spread( Label,value=Value) %>%
        mutate(Level = if_else(A>B, A, B), Label = if_else(A>B,"A", "B"))

Contour plot:

# Contour plot
ggplot(newplot_data, aes(x,y, z=Level, fill=Label, alpha=..level..))  + stat_contour(geom="polygon")

enter image description here

It seems the contour plot has some overlap due to round errors. We can try the raster plot:

#Raster plot
ggplot(newplot_data, aes(x,y, fill=Label, alpha=Level))  + geom_raster()

enter image description here