I am experimenting with ways to deal with overplotting in R, and one thing I want to try is to plot individual points but color them by the density of their neighborhood. In order to do this I would need to compute a 2D kernel density estimate at each point. However, it seems that the standard kernel density estimation functions are all grid-based. Is there a function for computing 2D kernel density estimates at specific points that I specify? I would imagine a function that takes x and y vectors as arguments and returns a vector of density estimates.
15
votes
Is there a specific reason why alpha blending or more standard binning approaches (like hexagonal binning) aren't sufficient?
– joran
I want outliers to be clearly visible as individual points. Alpha belnding makes the outliers faint, and hexagonal binning turns them into entire hexagons instead of single points. Kernel density estimation on the entire grid does a good job for most of the data, but all the outlier points turn into little gaussian "puffs", so I want to instead compute the kernel density estimate and use it to assign a color to each point. This would produce essentially the same appearance as the grid-based approach wherever lots of points overlap, but would make outliers plainly visible as discrete points.
– Ryan C. Thompson
2 Answers
6
votes
If I understand what you want to do, it could be achieved by fitting a smoothing model to the grid density estimate and then using that to predict the density at each point you are interested in. For example:
# Simulate some data and put in data frame DF
n <- 100
x <- rnorm(n)
y <- 3 + 2* x * rexp(n) + rnorm(n)
# add some outliers
y[sample(1:n,20)] <- rnorm(20,20,20)
DF <- data.frame(x,y)
# Calculate 2d density over a grid
library(MASS)
dens <- kde2d(x,y)
# create a new data frame of that 2d density grid
# (needs checking that I haven't stuffed up the order here of z?)
gr <- data.frame(with(dens, expand.grid(x,y)), as.vector(dens$z))
names(gr) <- c("xgr", "ygr", "zgr")
# Fit a model
mod <- loess(zgr~xgr*ygr, data=gr)
# Apply the model to the original data to estimate density at that point
DF$pointdens <- predict(mod, newdata=data.frame(xgr=x, ygr=y))
# Draw plot
library(ggplot2)
ggplot(DF, aes(x=x,y=y, color=pointdens)) + geom_point()
Or, if I just change n 10^6 we get
4
votes
I eventually found the precise function I was looking for: interp.surface
from the fields
package. From the help text:
Uses bilinear weights to interpolate values on a rectangular grid to arbitrary locations or to another grid.