3
votes

I am trying to plot a point histogram (a histogram that shows the values with a point instead of bars) that is log-scaled. The result should look like this:

Target Graph
MWE:

Lets simulate some Data:

set.seed(123)
d <- data.frame(x = rnorm(1000))

To get the point histogram I need to calculate the histogram data (hdata) first

hdata <- hist(d$x, plot = FALSE)
tmp <- data.frame(mids = hdata$mids, 
                  density = hdata$density, 
                  counts = hdata$counts)

which we can plot like this

p <- ggplot(tmp, aes(x = mids, y = density)) + geom_point() + 
            stat_function(fun = dnorm, col = "red")
p

to get this graph: First Try

In theory we should be able to apply the log scales (and set the y-limits to be above 0) and we should have a similar picture to the target graph.

However, if I apply it I get the following graph:

p + scale_y_log10(limits = c(0.001, 10))

Wrong Target

The stat_function clearly shows non-scaled values instead of producing a figure closer to the solid line in the first picture.

Any ideas?

Bonus Are there any ways to graph the histogram with dots without using the hist(..., plot = FALSE) function?

EDIT Workaround

One possible solution is to calculate the dnorm-data outside of ggplot and then insert it as a line. For example

tmp2 <- data.frame(mids = seq(from = min(tmp$mids), to = max(tmp$mids), 
                            by = (max(tmp$mids) - min(tmp$mids))/10000))
tmp2$dnorm <- dnorm(tmp2$mids) 

# Plot it
ggplot() + 
  geom_point(data = tmp, aes(x = mids, y = density)) + 
  geom_line(data = tmp2, aes(x = mids, y = dnorm), col = "red") + 
  scale_y_log10()

This returns a graph like the following. This is basically the graph, but it doesn't resolve the stat_function issue. enter image description here

2

2 Answers

3
votes
library(ggplot2)
set.seed(123)
d <- data.frame(x = rnorm(1000))
ggplot(d, aes(x)) +
  stat_bin(geom = "point", 
           aes(y = ..density..),
           #same breaks as function hist's default:
           breaks = pretty(range(d$x), n = nclass.Sturges(d$x), min.n = 1), 
           position = "identity") +
  stat_function(fun = dnorm, col = "red") +
  scale_y_log10(limits = c(0.001, 10))

resulting plot

0
votes

Another possible solution that I found while revisiting this issue is to apply the log10 to the stat_function-call.

library(ggplot2)

set.seed(123)
d <- data.frame(x = rnorm(1000))

hdata <- hist(d$x, plot = FALSE)
tmp <- data.frame(mids = hdata$mids, 
                  density = hdata$density, 
                  counts = hdata$counts)

ggplot(tmp, aes(x = mids, y = density)) + geom_point() + 
  stat_function(fun = function(x) log10(dnorm(x)), col = "red") +
  scale_y_log10()

Created on 2018-07-25 by the reprex package (v0.2.0).