Background and Task: Consider a random sample of size n with a binary outcome Y_i. Assume Y_i ~ Bern(pi_i). Assume the probit link function pi_i=Phi(X_i^T beta).
Create an X matrix with 200 observations on three covariates, each with standard normal distribution. Generate a y vector assuming the probit model is correct for the data, i.e. choose some values of beta and use these with the X values and probit link function to generate a set of outcomes y.
Write a function in R to fit probit models, taking a vector responses y and a vector of covariates and an intercept, X. Run the model on these data and compare the coefficient estimate to the true values.
Code:
X=rnorm(200*3) # generate 200x3=600 random standard normal values
dim(X)=c(200, 3) # set the dimensions of X to be 200x3
X=cbind(1, X) # add a column of 1's for the intercept
X # print X
beta=c(1,4,2,3) # choose some values of beta
pi_i=pnorm(X%*%beta)
for (i in 1:200) { # generate y vector
y[i]=rbinom(1, 1, pi_i[i])
}
loglik=function(par, X, y) {
pi_est=pnorm(X%*%par) # probit link function
ll=sum(y*log(pi_est)+(1-y)*log(1-pi_est)) # log likelihood for bernoulli sample
return(ll)
}
opt.out=optim(par=c(0,0,0,0), fn=loglik, X=X, y=y, method="BFGS", control=list(fnscale=-1), hessian=TRUE) # error in this line
Issue: I'm getting the error
Error in optim(par = c(0, 0, 0, 0), fn = loglik, X = X, y = y, method = "BFGS", : non-finite finite-difference value [3]
Does anyone know why this is?