0
votes

I tried to follow this example modify glm... user specificed link function in r but am getting errors. I have binary data, and would like to change the link function from "logit" to a negative exponential link. I want to predict the

probability of success(p) = 1-exp(linear predictor)

The reason I need this link instead of one of the built-in links is that p increases in a convex manner between 0 and 0.5, but the "logit", "cloglog", "probit", and "cauchy" only allow a concave shape. See attached photo for reference: predicted p vs binned observations

Simulate data

location<-as.character(LETTERS[rep(seq(from=1,to=23),30)])
success<-rbinom(n=690, size=1, prob=0.15)
df<-data.frame(location,success)
df$random_var<-rnorm(690,5,3)
df$seedling_size<-abs((0.1+df$success)^(1/df$random_var))
df<-df[order(df$location)]

Create custom link function. Note: eta = linear predictor, mu = probability

negex<-function(){
##link
linkfun<-function(mu) log(-mu+1)
linkinv<-function(eta) 1-exp(eta)
## derivative of inverse link with respect to eta
mu.eta<-function(eta)-exp(eta)
valideta<-function(eta) TRUE
link<-"log(-mu+1)"
structure(list(linkfun=linkfun,linkinv=linkinv,
             mu.eta=mu.eta,valideta=valideta,
             name=link),
        class="link-glm")
}

Model success as a function of seedling size

negexp<-negex()

model1<-glm(success~seedling_size,family=binomial(link=negexp),data=df)

Error: no valid set of coefficients has been found: please supply starting values

Model using glmer (My ultimate goal)

model2<-glmer(success~seedling_size+ (1|location),family=binomial(link=negexp),data=df)

Error in (function (fr, X, reTrms, family, nAGQ = 1L, verbose = 0L, maxit = 100L, : (maxstephalfit) PIRLS step-halvings failed to reduce deviance in pwrssUpdate

I get different error messages, but I think the problem is the same regardless of whether using glmer or glm, and that is that my link function is wrong somehow.

1

1 Answers

1
votes

I found the answer. Most helpful was this R thread from 2016. There were 2 issues. First, my link fuction was wrong. I revised it to this:

negex <- function() 
 { 
 linkfun <- function(mu) -log(1-mu) 
 linkinv <- function(eta) 1-exp(-eta) 
 mu.eta <- function(eta) exp(-eta) 
 valideta <- function(eta) all(is.finite(eta)&eta>0) 
 link <- paste0("negexp") 
 structure(list(linkfun = linkfun, linkinv = linkinv, 
             mu.eta = mu.eta, valideta = valideta, name = link), 
        class = "link-glm") 
} 

Second, the model required specific starting values. These will be unique to your data. Here is the first few lines of the data that I actually found the solution to:

   site plot sub_plot oak_success oak_o1_gt05ft..1
  0001   10        3           1                0
  0001   12        2           0                0
  0001   12        3           0                0
  0001   12        4           0                0
  0001   13        4           0                0

I don't know how to post the full data to this site, but if someone wants it to run the example, shoot me an email: [email protected]

negexp<-negex()

Hopefully this helps someone in the future, because I found no other examples of this being solved on stack overflow or online. Using the new starting values, I was able to get the model to run:

starting_values<-c(1,0) #1 for the intercept and 0 for the slope
h_gt05_solo_negex2<-glm(oak_success~ oak_o1_gt05ft..1 , 
                    family=binomial(link=negexp),start=starting_values,data=rocdf)
summary(h_gt05_solo_negex2)
Call:
glm(formula = oak_success ~ oak_o1_gt05ft..1, family = binomial(link = negexp), 
data = lt40, start = starting_values)

Deviance Residuals: 
Min       1Q   Median       3Q      Max  
-1.3808  -0.4174  -0.2637  -0.2637   2.5985  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)      0.034774   0.005484   6.341 2.28e-10 ***
oak_o1_gt05ft..1 0.023253   0.002187  10.635  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1416.9  on 2078  degrees of freedom
Residual deviance: 1213.5  on 2077  degrees of freedom
AIC: 1217.5

Number of Fisher Scoring iterations: 6

There were some issues with convergence. As seedling heights (oak_o1_gt05ft..1) got above 40ft, the parameter estimates became unreliable convergence issues. I had very few observations in this range, so I restricted the data to observations were the predictor was <40ft and re-ran the model. I also included "site" (same as "location" in the simulated data)). What you see in this figure are the predictions of oak success with respect to oak seedling heights for each site/location (black circles), the binned observations of successes/samples (large green dots) and the prediction of success probability without a site factor (blue line). It looks like the slope of the seedling size variable is more accurate when site is factored in.

Unfortunately, I was not able to get this model to run in glmer, so site had to be included as a fixed effect, thus, the standard errors and slope estimates for oak seedling height might be a bit conservative.