8
votes

How does one perform a multivariate (multiple dependent variables) logistic regression in R?

I know you do this for linear regression, and this works

form <-cbind(A,B,C,D)~shopping_pt+price
mlm.model.1 <- lm(form, data = train)

But when I try the following (see below) for logistic regression, it does not work.

model.logistic <- glm(form, family=binomial(link=logit), data=train)

Thank you for your help.

To add, it appears that my code to do this with linear models above may not be correct. I am trying to what is outlined in this document, which some might find useful.

ftp://ftp.cis.upenn.edu/pub/datamining/public_html/ReadingGroup/papers/multiResponse.pdf

1
I might be wrong..but I think the code you have for lm just fits four separate models, one for each dependent variable. I mean, you can just fit four logistic models.Julián Urbano
I do not believe this is correct. It accounts for the correlation matrixes between the 4 dependent variables, and simultaneously predicts 4 models accounting for those interdependenies.blast00
Are you sure? Any reference for that?Julián Urbano
thanks for the links. The second one though mentions this: "The individual coefficients, as well as their standard errors will be the same as those produced by the multivariate regression. However, the OLS regressions will not produce multivariate results, nor will they allow for testing of coefficients across equations"Julián Urbano
You can do this with a generalized linear mixed model (GLMM) package if you 'stack' your data appropriately: MCMCglmm (see chapter 5 of the Course Notes) or lme4 (see rpubs.com/bbolker/3336) should work, although the multivariate examples given there are multivariate normal. If you give a reproducible example I will consider posting a worked answer ...Ben Bolker

1 Answers

3
votes

It seems to me that lm(cbind(A,B,C,D)~shopping_pt+price) just fits four different models for the four dependent variables. The second link you provide even mentions:

The individual coefficients, as well as their standard errors will be the same as those produced by the multivariate regression. However, the OLS regressions will not produce multivariate results, nor will they allow for testing of coefficients across equations.

Meaning that all estimates will be the same, you'll just have to predict four times; and hypotheses on the fitted coefficients are independent across models.

I just tried this example below, showing that it indeed seems like that:

> set.seed(0)
> x1 <- runif(10)
> x2 <- runif(10)
> y1 <- 2*x1 + 3*x2 + rnorm(10)
> y2 <- 4*x1 + 5*x2 + rnorm(10)
> mm <- lm(cbind(y1,y2)~x1+x2)
> m1 <- lm(y1~x1+x2)
> m2 <- lm(y2~x1+x2)
# If we look at mm, m1 and m2, we see that models are identical
# If we predict new data, they give the same estimates
> x1_ <- runif(10)
> x2_ <- runif(10)
> predict(mm, newdata=list(x1=x1_, x2=x2_))
          y1       y2
1  2.9714571 5.965774
2  2.7153855 5.327974
3  2.5101344 5.434516
4  1.3702441 3.853450
5  0.9447582 3.376867
6  2.3809256 5.051257
7  2.5782102 5.544434
8  3.1514895 6.156506
9  2.4421892 5.061288
10 1.6712042 4.470486
> predict(m1, newdata=list(x1=x1_, x2=x2_))
        1         2         3         4         5         6         7         8         9        10 
2.9714571 2.7153855 2.5101344 1.3702441 0.9447582 2.3809256 2.5782102 3.1514895 2.4421892 1.6712042 
> predict(m2, newdata=list(x1=x1_, x2=x2_))
       1        2        3        4        5        6        7        8        9       10 
5.965774 5.327974 5.434516 3.853450 3.376867 5.051257 5.544434 6.156506 5.061288 4.470486

So this suggests that you can just fit four logistic models separately.