3
votes

I've seen many examples of a density plot but the density plot's y-axis is the probability. What I am looking for a is a line plot (like a density plot) but the y-axis should contain counts (like a histogram).

I can do this in excel where I manually make the bins and the frequencies and make a bar histogram and then I can change the chart type to a line - but can't find anything similar in R.

I've checked out both base and ggplot2; yet can't seem to find an answer. I understand that histograms are meant to be bars but I think representing them as a continuous line makes more visual sense.

5
I'm not sure you have your terminology quite right. To me, a line histogram would be something like plot(..., type = "h"). That is, a histogram with vertical lines rather than bars. Your question suggests that you want a density plot with count on the y-axis.Richie Cotton
Yes you're right. Density plot with count on the y-axisasangoi

5 Answers

6
votes

Using default R graphics (i.e. without installing ggplot) you can do the following, which might also make what the density function does a bit clearer:

# Generate some data
data=rnorm(1000)
# Get the density estimate
dens=density(data)
# Plot y-values scaled by number of observations against x values
plot(dens$x,length(data)*dens$y,type="l",xlab="Value",ylab="Count estimate")
3
votes

This is an old question, but I thought it might be helpful to post a solution that specifically addresses your question.

In ggplot2, you can plot a histogram and display the count with bars using:

ggplot(data) +  
geom_histogram()

You can also plot a histogram and display the count with lines using a frequency polygon:

ggplot(data) + 
geom_freqpoly()

For more info -- ggplot2 reference

0
votes

To adapt the example on the ?stat_density help page:

m <- ggplot(movies, aes(x = rating))
# Standard density plot.
m + geom_density()
# Density plot with y-axis scaled to counts.
m + geom_density(aes(y = ..count..))
0
votes

Although this is old, I thought the following might be useful. Let's say you have a data set of 10,000 points, and you believe they belong to a certain distribution, and you would like to plot the histogram of the actual data and the line of the probability density of the ideal distribution on top of it.

noise <- 2
#
# the noise is tagged onto the end using runif
# just do demo issues w/real data and fitting
# the subtraction causes the data to have some
# negative values, which must be addressed in 
# the fit later on
#
noisylognorm <- rlnorm(10000, 
                        mean = 0.25, 
                        sd = 1) + 
                        (noise * runif(10000) - noise / 10)
#
# using package fitdistrplus
#
# subset is used to remove the negative values
# as the lognormal distribution needs positive only
#
fitlnorm <- fitdist(subset(noisylognorm, 
                           noisylognorm > 0),
                           "lnorm")
fitlnorm_density <- density(rlnorm(10000, 
                                   mean = fitlnorm$estimate[1],
                                   sd = fitlnorm$estimate[2]))
hist(subset(noisylognorm, 
            noisylognorm < 25),
     breaks = seq(-1, 25, 0.5),
     col = "lightblue",
     xlim = c(0, 25),
     xlab = "value",
     ylab = "frequency",
     main = paste0("Log Normal Distribution\n",
                   "noise = ", noise))

lines(fitlnorm_density$x, 
      10000 * fitlnorm_density$y * 0.5,
      type="l",
      col = "red")

Note the * 0.5 in the lines function. As far as I can tell, this is necessary to account for the width of the hist() bars.

0
votes

There is a very simple and fast way for count data.

First let's generate some dummy count data:

my.count.data = rpois(n = 10000, lambda = 3)

And then the plotting command (assuming you have called library(magrittr)):

my.count.data %>% table %>% plot