10
votes

In glm in R, the default link functions for the Gamma family are inverse,identity and log. Now for my particular question, I need to use gamma regression with response Y and a modified link function in the form of log(E(Y)-1)). Thus, I consider modifying some glm-related functions in R. There are several functions that may be relevant, and I am seeking help for anyone who had previous experience in doing this.

For example, the functions Gamma is defined as

function (link = "inverse") 
{
  linktemp <- substitute(link)
  if (!is.character(linktemp)) 
    linktemp <- deparse(linktemp)
  okLinks <- c("inverse", "log", "identity")
  if (linktemp %in% okLinks) 
    stats <- make.link(linktemp)
  else if (is.character(link)) 
    stats <- make.link(link)
  else {
    if (inherits(link, "link-glm")) {
      stats <- link
      if (!is.null(stats$name)) 
        linktemp <- stats$name
    }
    else {
      stop(gettextf("link \"%s\" not available for gamma family; available links are %s", 
                    linktemp, paste(sQuote(okLinks), collapse = ", ")), 
           domain = NA)
    }
  }
  variance <- function(mu) mu^2
  validmu <- function(mu) all(mu > 0)
  dev.resids <- function(y, mu, wt) -2 * wt * (log(ifelse(y == 
                                                            0, 1, y/mu)) - (y - mu)/mu)
  aic <- function(y, n, mu, wt, dev) {
    n <- sum(wt)
    disp <- dev/n
    -2 * sum(dgamma(y, 1/disp, scale = mu * disp, log = TRUE) * 
               wt) + 2
  }
  initialize <- expression({
    if (any(y <= 0)) stop("non-positive values not allowed for the 'gamma' family")
    n <- rep.int(1, nobs)
    mustart <- y
  })
  simfun <- function(object, nsim) {
    wts <- object$prior.weights
    if (any(wts != 1)) 
      message("using weights as shape parameters")
    ftd <- fitted(object)
    shape <- MASS::gamma.shape(object)$alpha * wts
    rgamma(nsim * length(ftd), shape = shape, rate = shape/ftd)
  }
  structure(list(family = "Gamma", link = linktemp, linkfun = stats$linkfun, 
                 linkinv = stats$linkinv, variance = variance, dev.resids = dev.resids, 
                 aic = aic, mu.eta = stats$mu.eta, initialize = initialize, 
                 validmu = validmu, valideta = stats$valideta, simulate = simfun), 
            class = "family")
}

Also, in order to use the command glm(y ~ log(mu), family = Gamma(link = MyLink)), do I also need to modify the glm.fit function? Thank you!


Updates and New Question

According to @Ben Bolker's comments, we need to write a new link function called vlog (with real name "log(exp(y)-1)"). I find that the make.link function might be responsible for such a modification. It is defined as

function (link) 
{
  switch(link, logit = {
    linkfun <- function(mu) .Call(C_logit_link, mu)
    linkinv <- function(eta) .Call(C_logit_linkinv, eta)
    mu.eta <- function(eta) .Call(C_logit_mu_eta, eta)
    valideta <- function(eta) TRUE
  }, 

  ...

  }, log = {
    linkfun <- function(mu) log(mu)
    linkinv <- function(eta) pmax(exp(eta), .Machine$double.eps)
    mu.eta <- function(eta) pmax(exp(eta), .Machine$double.eps)
    valideta <- function(eta) TRUE
  }, 

  ...

  structure(list(linkfun = linkfun, linkinv = linkinv, mu.eta = mu.eta, 
                 valideta = valideta, name = link), class = "link-glm")
}

My question is: if we want to permanently add this link function vlog to glm, so that in each R session, we can use glm(y~x,family=Gamma(link="log(exp(y)-1)")) directly, shall we use the fix(make.link) and then add the definition of vlog to its body? Or fix() can only do that in current R session? Thanks again!

One more thing: I realize that maybe another function needs to be modified. It is Gamma, defined as

function (link = "inverse") 
{
  linktemp <- substitute(link)
  if (!is.character(linktemp)) 
    linktemp <- deparse(linktemp)
  okLinks <- c("inverse", "log", "identity")
  if (linktemp %in% okLinks) 
    stats <- make.link(linktemp)
  else if (is.character(link)) 
    stats <- make.link(link)
  else {
    if (inherits(link, "link-glm")) {
      stats <- link
      if (!is.null(stats$name)) 
        linktemp <- stats$name
    }
    else {
      stop(gettextf("link \"%s\" not available for gamma family; available links are %s", 
                    linktemp, paste(sQuote(okLinks), collapse = ", ")), 
           domain = NA)
    }
  }
  variance <- function(mu) mu^2
  validmu <- function(mu) all(mu > 0)
  dev.resids <- function(y, mu, wt) -2 * wt * (log(ifelse(y == 
                                                            0, 1, y/mu)) - (y - mu)/mu)
  aic <- function(y, n, mu, wt, dev) {
    n <- sum(wt)
    disp <- dev/n
    -2 * sum(dgamma(y, 1/disp, scale = mu * disp, log = TRUE) * 
               wt) + 2
  }
  initialize <- expression({
    if (any(y <= 0)) stop("non-positive values not allowed for the 'gamma' family")
    n <- rep.int(1, nobs)
    mustart <- y
  })
  simfun <- function(object, nsim) {
    wts <- object$prior.weights
    if (any(wts != 1)) 
      message("using weights as shape parameters")
    ftd <- fitted(object)
    shape <- MASS::gamma.shape(object)$alpha * wts
    rgamma(nsim * length(ftd), shape = shape, rate = shape/ftd)
  }
  structure(list(family = "Gamma", link = linktemp, linkfun = stats$linkfun, 
                 linkinv = stats$linkinv, variance = variance, dev.resids = dev.resids, 
                 aic = aic, mu.eta = stats$mu.eta, initialize = initialize, 
                 validmu = validmu, valideta = stats$valideta, simulate = simfun), 
            class = "family")
}

I think we also need to revise

okLinks <- c("inverse", "log", "identity")

to

okLinks <- c("inverse", "log", "identity", "log(exp(y)-1)")

?

2
I don't understand all this extra complexity. I show the example below where the alternate-link model can be fitted via glm(...,family=Gamma(link=vlog()) as long as vlog has been defined. You can put vlog in a .R file and source() it in every session, or create a small package that defines the function. If you want you can also put it in your R profile, but it would probably be more transparent to just source("vlog.R") in every R script where you are going to use it. I don't think Gamma() needs to be modified (again, see my answer).Ben Bolker
I guess if you insist on calling the link function by name you would have to do all that extra hacking you describe above, but I don't see what's wrong with family=Gamma(link=vlog()) ...Ben Bolker
@BenBolker: Yes, I tried your codes and they work perfectly! Maybe my extra question is more general about fixing an R function to include user-defined options permanently. I will include the vlog function in my package. Thanks again for your help ;-)alittleboy
I would say you should copy the function from the R source code (so that you get any relevant comments included) and incorporate it in a package you load, which will mask the base versions. That's a sufficiently different task that you should probably pose it as a separate question.Ben Bolker
@BenBolker: yep -- I will post it as a separate question ;-)alittleboy

2 Answers

13
votes

I'm basically following the form of the example in ?family which shows a user-specified link of the form qlogis(mu^(1/days)).

We want a link of the form eta = log(exp(y)-1) (so the inverse link is y=log(exp(eta)+1), and mu.eta = dy/d(eta) = 1/(1+exp(-eta))

vlog <- function() {
    ## link
    linkfun <- function(y) log(exp(y)-1)
    ## inverse link
    linkinv <- function(eta)  log(exp(eta)+1)
    ## derivative of invlink wrt eta
    mu.eta <- function(eta) { 1/(exp(-eta) + 1) }
    valideta <- function(eta) TRUE
    link <- "log(exp(y)-1)"
    structure(list(linkfun = linkfun, linkinv = linkinv,
                   mu.eta = mu.eta, valideta = valideta, 
                   name = link),
              class = "link-glm")
}

Basic checks:

vv <- vlog()
vv$linkfun(vv$linkinv(27))  ## check invertibility
library("numDeriv")
all.equal(grad(vv$linkinv,2),vv$mu.eta(2))  ## check derivative

Example:

set.seed(101)
n <- 1000                       
x <- runif(n)
sh <- 2                        
y <- rgamma(n,scale=vv$linkinv(2+3*x)/sh,shape=sh)
glm(y~x,family=Gamma(link=vv))                       
## 
## Call:  glm(formula = y ~ x, family = Gamma(link = vv))
## 
## Coefficients:
## (Intercept)            x  
##       1.956        3.083  
## 
## Degrees of Freedom: 999 Total (i.e. Null);  998 Residual
## Null Deviance:       642.2 
## Residual Deviance: 581.8     AIC: 4268 
## 
3
votes

Try gnlm::gnlr(). Using x, y, sh from Ben Bolker's example:

library(gnlm)
# custom link / inverse 
custom_inv <- function(eta)  log(exp(eta)+1)
library(gnlm)
gnlr(y=y,
     distribution = "gamma",
     mu = ~ custom_inv(beta0 + beta1*x),
     pmu = list(beta0=0, beta1=0),
     pshape=sh
)
# Location parameters:
#        estimate      se
# beta0     1.956  0.1334
# beta1     3.083  0.2919
# 
# Shape parameters:
#       estimate       se
# p[1]     0.625  0.04133