2
votes

I have recently came across a problem with ggplot2::geom_density that I am not able to solve. I am trying to visualise a density of some variable and compare it to a constant. To plot the density, I am using the ggplot2::geom_density. The variable for which I am plotting the density, however, happens to be a constant (this time):

df <- data.frame(matrix(1,ncol = 1, nrow = 100))
colnames(df) <- "dummy"

dfV <- data.frame(matrix(5,ncol = 1, nrow = 1))
colnames(dfV) <- "latent"

ggplot() + 
  geom_density(data = df, aes(x = dummy, colour = 's'), 
                 fill = '#FF6666', alpha = 0.2, position = "identity") +
  geom_vline(data = dfV, aes(xintercept = latent, color = 'ls'), size = 2)

enter image description here This is OK and something I would expect. But, when I shift this distribution to the far right, I get a plot like this:

df <- data.frame(matrix(71,ncol = 1, nrow = 100))
colnames(df) <- "dummy"

dfV <- data.frame(matrix(75,ncol = 1, nrow = 1))
colnames(dfV) <- "latent"

ggplot() + 
  geom_density(data = df, aes(x = dummy, colour = 's'), 
               fill = '#FF6666', alpha = 0.2, position = "identity") +
  geom_vline(data = dfV, aes(xintercept = latent, color = 'ls'), size = 2) 

enter image description here which probably means that the kernel estimation is still taking 0 as the centre of the distribution (right?).

Is there any way to circumvent this? I would like to see a plot like the one above, only the centre of the kerner density would be in 71 and the vline in 75.

Thanks

1
it has to do with the adjustment and bw parameters passed to stat::density via ggplot2::stat_density. I'm not sure exactly how to modify it to get your intended solution... Though of course, doing density estimation of a constant is sillyAlex W
Well? Is this helpful?Mike Wise

1 Answers

0
votes

Well I am not sure what the code does, but I suspect the geom_density primitive was not designed for a case where the values are all the same, and it is making some assumptions about the distribution that are not what you expect. Here is some code and a plot that sheds some light:

# Generate 10 data sets with 100 constant values from 0 to 90
# and then merge them into a single dataframe

dfs <- list()
for (i in 1:10){
  v <- 10*(i-1)
  dfs[[i]] <- data.frame(dummy=rep(v,100),facet=v)
}
df <- do.call(rbind,dfs)

# facet plot them
ggplot() + 
  geom_density(data = df, aes(x = dummy, colour = 's'), 
                       fill = '#FF6666', alpha = 0.5, position = "identity") +
  facet_wrap( ~ facet,ncol=5 )

Yielding:

enter image description here

So it is not doing what you thought it was, but it is also probably not doing what you want. You could of course make it "translation-invariant" (almost) by adding some noise like this for example:

set.seed(1234)

noise <- +rnorm(100,0,1e-3)
dfs <- list()
for (i in 1:10){
  v <- 10*(i-1)
  dfs[[i]] <- data.frame(dummy=rep(v,100)+noise,facet=v)
}
df <- do.call(rbind,dfs)

ggplot() + 
  geom_density(data = df, aes(x = dummy, colour = 's'), 
               fill = '#FF6666', alpha = 0.5, position = "identity") +
  facet_wrap( ~ facet,ncol=5 )

Yielding:

enter image description here

Note that there is apparently a random component to the geom_density function, and I can't see how to set the seed before each instance, so the estimated density is a bit different each time.