2
votes

I am trying to superimpose a function via stat_function() in ggplot2 as described here: Superimposing a log-normal density in ggplot and stat_function() so using the command:

ggplot(data=data, aes(x=x)) +
  geom_histogram(aes(y = ..density..)) +
  stat_function(fun = dlnorm, size=1, color='gray') +
  theme_bw()

It works with the provided example where the data to fit to is generated with rf. However if I try to apply it to the dataset below, it does not fit. What is wrong with my data set for stat_function not to be able to fit it? Is their some mathematical mistakes in what I am trying to do? is there a problem with my data.frame number type?

Here are the 2 results I get with their respective data set:

Does not fit:

enter image description here

data <- data.frame(x=c(83.92527, 75.72644, 76.44609, 100.86324, 87.44626, 78.37094, 77.71285, 94.66197, 69.76701, 83.93192, 68.26451, 71.49349, 66.51735, 76.72893, 76.76861, 81.38741, 67.9929, 74.44888, 86.06689, 76.9507, 123.47084, 90.56689, 81.50586, 74.04925, 71.85926, 91.60573, 74.57221, 68.53912, 75.34062, 80.65242, 85.15228, 104.06124, 72.42447, 75.27314, 73.01164, 84.94915, 80.04429, 86.93343, 82.04338, 77.70276, 84.0946, 84.35794, 96.01299, 72.26497, 115.12634, 74.87349, 80.4077, 77.33795, 73.4267, 68.03937, 82.50726, 78.13893, 68.7824, 85.83253, 80.94278, 78.06742, 75.68488, 133.39636, 92.89265, 80.01308, 187.60977, 86.73605, 76.10981, 71.80097, 78.31453, 75.60157, 86.07133, 76.92616, 71.48474, 133.32378, 78.6234, 131.75722, 82.31215, 74.46081, 73.87192, 82.53808, 74.79978, 68.17945, 112.14891, 89.37358, 79.76679, 75.2691, 86.79122, 79.46324, 86.15034, 74.70525, 71.61041, 82.48748, 77.10785, 73.95811, 76.25556, 82.17103, 75.97427, 80.19654, 88.01052, 75.10031, 85.93202, 78.12773, 72.52136, 93.67812))

Fits:

enter image description here

data <- data.frame(x = rf(100, df1 = 7, df2 = 120))
1

1 Answers

4
votes

The default parameter values for mean and sd of dlnorm are 0 and 1. You have to estimate the parameters for the actual dataset. This can be done with the function fitdistr in the MASS package.

library(MASS)
fit <- fitdistr(data$x, "lognormal")

Now, you can use the estimates for the dlnorm function:

ggplot(data=data, aes(x=x)) +
      geom_histogram(aes(y = ..density..)) +
      stat_function(fun = dlnorm, size = 1, color = 'gray',
                    args = list(mean = fit$estimate[1], sd = fit$estimate[2])) +
      theme_bw() 

enter image description here