Zeros in Count data, how to deal with?

Question

I have a data set with count data. I do a Poisson regression with glm. Now i want to compute the null deviance by hand. For that i need the loglike of the full model. For the loglike i get NaN. I think its because some values of the response variable are 0 and log(0) produce NaN. However glm computes the null deviance. So there must be a trick to deal with the 0 entries in y. Should i replace them with very small values like 0,00001 oder what could be a possible solution to get a result for lf that is not NaN

data(discoveries)
disc <- data.frame(count=as.numeric(discoveries),
                   year=seq(0,(length(discoveries)-1),1))

yearSqr <- disc$year^2

hush <- glm(count ~ year + yearSqr , family = "poisson", disc)


# modelFrame
test <- hush$model
# reponse variable 
test$count

# formula for loglike full modell lf = sum(y * log(y) - y - log(factorial(y)))


# result is NaN
lf <- sum(test$count * log(test$count) - test$count - log(factorial(test$count)))

zero inflatetd models are only used if we have more zero values then non zero values and on my data set i only have 5 zero entries ... — Dima Ku

Julius Vainora Julius Vainora · Accepted Answer · 2018-03-05T21:05:40

Your applied formula is wrong; it does not use any information about estimated parameters. You want to use the following:

sum(test$count * log(fitted(hush)) - fitted(hush) - log(factorial(test$count)))
# [1] -200.9226
logLik(hush)
# 'log Lik.' -200.9226 (df=3)

Zeros in Count data, how to deal with?

1 Answers