I'm trying to check that I understand how R calculates the statistic AIC, AICc (corrected AIC) and BIC for a glm()
model object (so that I can perform the same calculations on revoScaleR::rxGlm()
objects - particularly the AICc, which isn't available by default)
I had understood that these were defined as follows:
let p
= number of model parameters
let n
= number of data points
AIC = deviance + 2p
AICc = AIC + (2p^2 + 2p)/(n-p-1)
BIC = deviance + 2p.log(n)
So I tried to replicate these numbers and compare them to the corresponding R function calls. It didn't work:
library(AICcmodavg) # for the AICc() function
data(mtcars)
glm_a1 <- glm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb,
data = mtcars,
family = gaussian(link = "identity"),
trace = TRUE)
summary(glm_a1)
n <- nrow(glm_a1$data) # 32
p <- glm_a1$rank # 11
dev <- glm_a1$deviance# 147.49
my_AIC <- dev + 2 * p
my_AICc <- my_AIC + (2 * p^2 + 2 * p)/(n - p - 1)
my_BIC <- dev + 2 * p * log(n)
AIC(glm_a1) # 163.71
my_AIC # 169.49
AICc(glm_a1) # 180.13 (from AICcmodavg package)
my_AICc # 182.69
BIC(glm_a1) # 181.30
my_BIC # 223.74
By using debug(AIC)
I can see that the calculation is different. It's based on 12 parameters (one extra for the estimated dispersion/scale parameter?). Also the log likelihood is obtained using logLik()
which brings back a number -69.85
, which suggests to me that the model deviance would be -2*-69.85 = 139.71
(which it isn't).
Does anyone know what I've done wrong please? Thank you.