Visualizing the Kolmogorov-Smirnov statistic in ggplot2

Question

The Kolmogorov-Smirnov statistic is defined as the maximum distance between the empirical and the hypothesized cumulative distribution function. Rather than looking at numbers, I think it is much preferable to locate the maximum difference using a graph.

I know how to plot the empirical distribution function

p1<-qplot(rnorm(30),stat="ecdf",geom="step")

but could you please tell me how I could add on the same plot the cumulative distribution function of the theoretical distribution? For my case, the theoretical distribution is the standard normal but I am interested in the generalization to every distribution function.

Thank you.

Just use pnorm: x <- seq(-3, 3, length = 100); plot(x, pnorm(x)). For other distributions, use e.g. pbeta, pcauchy, etc. — Gregor Thomas
@Gregor I see, but how can you add that on an existing ggplot2 plot? — JohnK

MrFlick MrFlick · Accepted Answer · 2014-12-03T22:17:36

If you want to use ggplot, just do

set.seed(15)
dd <- data.frame(x=rnorm(30))
ggplot(dd, aes(x)) +
    stat_ecdf() + 
    stat_function(fun = pnorm, colour = "red")

You can find the maximal distance if you like with

ed <- ecdf(dd$x)
maxdiffidx <- which.max(abs(ed(dd$x)-pnorm(dd$x)))
maxdiffat <- dd$x[maxdiffidx]

and add that to the plot with

ggplot(dd, aes(x)) +
    stat_ecdf() + 
    stat_function(fun = pnorm, colour = "red") + 
    geom_vline(x=maxdiffat, lty=2)

enter image description here

Visualizing the Kolmogorov-Smirnov statistic in ggplot2

1 Answers