3
votes

I have a dataframe with two columns x and y that each contain values between 0 and 100 (the data are paired). I want to correlate them to each other using binned scatter plots. If I were to use a regular scatter plot, it would be easy to do:

geom_point(aes(x=x, y=y))

but I'd like to instead bin the points into N bins from 0 to 100, get the average value of x in each bin and the average value of y for the points in that bin, and show that as a scatter plot - so correlate the binned averages instead of the raw data points.

is there a clever/quick way to do this in ggplot2, using some combination of geom_smooth() and geom_point? Or does it have to be pre-computed manually and then plotted?

2

2 Answers

8
votes

Yes, you can use stat_summary_bin.

set.seed(42)
x <- runif(1e4)
y <- x^2 + x + 4 * rnorm(1e4)
df <- data.frame(x=x, y=y)

library(ggplot2)
(ggplot(df, aes(x=x,y=y)) +
  geom_point(alpha = 0.4) +
  stat_summary_bin(fun.y='mean', bins=20,
                   color='orange', size=2, geom='point'))

enter image description here

0
votes

I suggest geom_bin2d.

DF <- data.frame(x=1:100,y=1:100+rnorm(100))

library(ggplot2)
p <- ggplot(DF,aes(x=x,y=y)) + geom_bin2d()
print(p)

enter image description here