15
votes

I am experimenting with ways to deal with overplotting in R, and one thing I want to try is to plot individual points but color them by the density of their neighborhood. In order to do this I would need to compute a 2D kernel density estimate at each point. However, it seems that the standard kernel density estimation functions are all grid-based. Is there a function for computing 2D kernel density estimates at specific points that I specify? I would imagine a function that takes x and y vectors as arguments and returns a vector of density estimates.

2
Is there a specific reason why alpha blending or more standard binning approaches (like hexagonal binning) aren't sufficient?joran
I want outliers to be clearly visible as individual points. Alpha belnding makes the outliers faint, and hexagonal binning turns them into entire hexagons instead of single points. Kernel density estimation on the entire grid does a good job for most of the data, but all the outlier points turn into little gaussian "puffs", so I want to instead compute the kernel density estimate and use it to assign a color to each point. This would produce essentially the same appearance as the grid-based approach wherever lots of points overlap, but would make outliers plainly visible as discrete points.Ryan C. Thompson

2 Answers

6
votes

If I understand what you want to do, it could be achieved by fitting a smoothing model to the grid density estimate and then using that to predict the density at each point you are interested in. For example:

# Simulate some data and put in data frame DF
n <- 100
x <- rnorm(n)
y <- 3 + 2* x * rexp(n) + rnorm(n)
# add some outliers
y[sample(1:n,20)] <- rnorm(20,20,20)
DF <- data.frame(x,y)

# Calculate 2d density over a grid
library(MASS)
dens <- kde2d(x,y)

# create a new data frame of that 2d density grid
# (needs checking that I haven't stuffed up the order here of z?)
gr <- data.frame(with(dens, expand.grid(x,y)), as.vector(dens$z))
names(gr) <- c("xgr", "ygr", "zgr")

# Fit a model
mod <- loess(zgr~xgr*ygr, data=gr)

# Apply the model to the original data to estimate density at that point
DF$pointdens <- predict(mod, newdata=data.frame(xgr=x, ygr=y))

# Draw plot
library(ggplot2)
ggplot(DF, aes(x=x,y=y, color=pointdens)) + geom_point()

enter image description here

Or, if I just change n 10^6 we get

enter image description here

4
votes

I eventually found the precise function I was looking for: interp.surface from the fields package. From the help text:

Uses bilinear weights to interpolate values on a rectangular grid to arbitrary locations or to another grid.