R: optimization for large data in R

Question

I have a panel data that consists of 5908 individual observations in 5 years. I want to estimate the maximum likelihood estimator using optim() package. Here's my code

library(pglm)
library(plm)
data("HealthIns")
dat<- pdata.frame(HealthIns,index = c("id","year"))
summary(dat)

dat$claims<-ifelse(dat$size>=4,1,0)
y<-data.matrix(dat$claims)
y[is.na(y)]=0
Y<-matrix(data=y,nrow=5908,ncol=5)

dat$ageclass<-ifelse(dat$age >=30,1,0)
x1<-data.matrix(dat$ageclass)
x1[is.na(x1)]=0
X1<-matrix(data=x1,nrow=5908,ncol=5)

dat$gender <-ifelse(dat$sex=="male",1,0)
x2<-data.matrix(dat$gender)
x2[is.na(x2)]=0
X2<-matrix(data=x2,nrow=5908,ncol=5)

dat$child<-ifelse(dat$child=="yes",1,0)
x3<-data.matrix(dat$child)
x3[is.na(x3)]=0
X3<-matrix(data=x3,nrow=5908,ncol=5)

The -log likelihood that I want to use is

po.gam=function(para){
  #Lambda(i,t)
  {for (i in (1:5908)){
    for(t in (1:5)){
    lambda<-as.matrix(exp(para[1] + para[2]*X1 + para[3]*X2 + para[4]*X3),nrow=5908,ncol=5)}}
  }
  
 
  num.claims.of.t <-numeric(nrow(Y))
  {for (i in seq(nrow(Y))){
    num.claims.of.t[i] <-sum(Y[i,])}
  }
  
  num.lambda.of.t<-numeric(nrow(Y))
  {for (i in seq(nrow(Y))){
    num.lambda.of.t[i]<-sum(lambda[i,])}
  }
  
  prod.exp<-numeric(nrow(Y))
  {for (i in seq(nrow(Y))){
    prod.exp[i]<-prod(lambda[i,]^Y[i,]/factorial(Y[i,]))}
  }
  
  
  joint.pdf.mvnb<-(prod.exp)*(gamma(num.claims.of.t + (1/para[5]))/gamma(1/para[5]))*(((1/para[5])/(num.lambda.of.t + (1/para[5])))^(1/para[5]))*((num.lambda.of.t + (1/para[5]))^(-num.claims.of.t))
  
  #PRODUC NUMBER OF CLAIMS SEMUA INDIVIDU
  prod.mvnb=1
  for (i in (length(joint.pdf.mvnb))){
    prod.mvnb<-prod(joint.pdf.mvnb[i])
  }
  return(-log(prod.mvnb))
}

Then using optim()

start.value <- c(beta0=0.01,beta1=0.01,beta2=0.01,beta3=0.01,alfa=0.01)
MLE_pogam<-optim(start.value,fn=po.gam,hessian=FALSE)
MLE_pogam

And the program will run more than 2 hours still without any output. Do you have any suggestion for optimize the log likelihood function for the big data? Thank you!!

I didn't go into the details, but perhaps you need to look for a more optimized function to perform this. From the documentation of optim The default method is an implementation of that of Nelder and Mead (1965), that uses only function values and is robust but relatively slow. . — user2974951
The first four lines of the function calculate lambda 5908*5 times, but it is independent of i and t, so the result will be exactly the same each time, so the for loops could be removed — Miff

eduardokapp eduardokapp · Accepted Answer · 2021-04-09T12:32:33

Try using the pso package. It implements the Particle Swarm Optimization algorithm with pso::psoptim() as a direct replacement for the optim() function. It is a stochastic algorithm, so it should be relatively faster.

R: optimization for large data in R

1 Answers