4
votes

I'm estimating a logit regression with multiple predictor variables by hand in R using the mle() method. I'm having trouble passing along the additional arguments needed to calculate log likelihood in the function calcLogLikelihood below.

Here's my function that calculates negative log likelihood.

calcLogLikelihood <- function(betas, x, y) { 
# Computes the negative log-likelihood 
#   
# Args: 
#   x: a matrix of the predictor variables in the logit model 
#   y: a vector of the outcome variable (e.g. living in SF, etc)
#   betas: a vector of beta coefficients used in the logit model 
#  
# Return: 
#   llf: the negative log-likelihood value (to be minimized via MLE)
# 
# Error handling: 
# Check if any values are null, and whether there are same number of coefficients as there are  predictors
  if (TRUE %in% is.na(x) || TRUE %in% is.na(y)) {
    stop(" There is one or more NA value in x and y!")
  }
  nbetas <- sapply(betas, length)
  if (nbetas-1 != ncol(x)) {
     print(c(length(betas)-1, length(x)))
     stop(" Categorical vector and coef vector of different lengths!")
   }
  linsum <- betas$betas[1] + sum(betas$betas[2:nbetas] * x)
  p <- CalcInvlogit(linsum)
  llf <- -1 * sum(data$indweight * (y * log(p) + (1-y) * log(1-p)))
  return(llf)

}

Here's what my x and y data matrices look like:

> head(x)
  agebucket_(0,15] agebucket_(15,30] agebucket_(30,45] agebucket_(45,60] agebucket_(60,75]
1                0                 0                 1                 0                 0
2                0                 0                 1                 0                 0
3                0                 0                 1                 0                 0
4                0                 0                 1                 0                 0
5                0                 0                 1                 0                 0    
6                0                 0                 0                 1                 0

> head(y)
 [,1]
[1,]    1
[2,]    1
[3,]    0
[4,]    0
[5,]    1
[6,]    0

Here's the call to my function:

# Read in data
data <- read.csv("data.csv")   

# cont.x.vars and dummy.x.vars are arrays of predictor variable column names
x.vars <- c(cont.x.vars, dummy.x.vars)

# Select y column. This is the dependent variable name.
y.var <- "Housing"

# Select beta starting values
betas <- list("betas"=c(100, rep(.1, length(x.vars))))

# Select columns from the original dataframe
x <- data.matrix(data[, x.vars])
y <- data.matrix(data[, y.var])

# Minimize LLF
fit <- mle(calcLogLikelihood, betas, x=x, y=y)

Here's my error message:

 Error in is.na(x) : 'x' is missing 

This error seems to be coming because I'm not passing along the x and y parameters required by calcLogLikelihood correctly, but I'm not sure what's going wrong. How do I fix this error?

1
Looks like your 'x.vars'-variable may not match the column names of the 'data'-object. Feel free to prove me wrong by posting: colnames(data) and dput(x.vars)IRTFM

1 Answers

2
votes

The error arises because the function stats4::mle does not pass on any arguments using the ellipsis argument to your likelihood function. Instead the ellipsis is used to pass further arguments to optim (see ?stats4::mle). You have to take care that your likelihood function is only a function of the parameters to be optimized. The data, i.e. x and y, can not be passed in a call to mle.

You have two options. 1. redefine your likelihood function. You can either rely on R's lexical scoping rules in that you treat the data (x, y) as free variables (just remove the arguments x and y from the function definition and define x and y in your workspace), or you define a closure explicitly which is a more robust solution and explained (e.g.) here. 2. you can also use optim instead of mle, which allows you to keep your definition of the likelihood and is used by mle as optimizer in the background.