I'm estimating a logit regression with multiple predictor variables by hand in R using the mle() method. I'm having trouble passing along the additional arguments needed to calculate log likelihood in the function calcLogLikelihood
below.
Here's my function that calculates negative log likelihood.
calcLogLikelihood <- function(betas, x, y) {
# Computes the negative log-likelihood
#
# Args:
# x: a matrix of the predictor variables in the logit model
# y: a vector of the outcome variable (e.g. living in SF, etc)
# betas: a vector of beta coefficients used in the logit model
#
# Return:
# llf: the negative log-likelihood value (to be minimized via MLE)
#
# Error handling:
# Check if any values are null, and whether there are same number of coefficients as there are predictors
if (TRUE %in% is.na(x) || TRUE %in% is.na(y)) {
stop(" There is one or more NA value in x and y!")
}
nbetas <- sapply(betas, length)
if (nbetas-1 != ncol(x)) {
print(c(length(betas)-1, length(x)))
stop(" Categorical vector and coef vector of different lengths!")
}
linsum <- betas$betas[1] + sum(betas$betas[2:nbetas] * x)
p <- CalcInvlogit(linsum)
llf <- -1 * sum(data$indweight * (y * log(p) + (1-y) * log(1-p)))
return(llf)
}
Here's what my x and y data matrices look like:
> head(x)
agebucket_(0,15] agebucket_(15,30] agebucket_(30,45] agebucket_(45,60] agebucket_(60,75]
1 0 0 1 0 0
2 0 0 1 0 0
3 0 0 1 0 0
4 0 0 1 0 0
5 0 0 1 0 0
6 0 0 0 1 0
> head(y)
[,1]
[1,] 1
[2,] 1
[3,] 0
[4,] 0
[5,] 1
[6,] 0
Here's the call to my function:
# Read in data
data <- read.csv("data.csv")
# cont.x.vars and dummy.x.vars are arrays of predictor variable column names
x.vars <- c(cont.x.vars, dummy.x.vars)
# Select y column. This is the dependent variable name.
y.var <- "Housing"
# Select beta starting values
betas <- list("betas"=c(100, rep(.1, length(x.vars))))
# Select columns from the original dataframe
x <- data.matrix(data[, x.vars])
y <- data.matrix(data[, y.var])
# Minimize LLF
fit <- mle(calcLogLikelihood, betas, x=x, y=y)
Here's my error message:
Error in is.na(x) : 'x' is missing
This error seems to be coming because I'm not passing along the x and y parameters required by calcLogLikelihood
correctly, but I'm not sure what's going wrong. How do I fix this error?
colnames(data)
anddput(x.vars)
– IRTFM