0
votes

I would like to plot the regression line from a glm model (written below). Ideally I'd like to plot it over the observed data, but I haven't been able to adapt the code I've found elsewhere (e.g. predict.glm, Plot predicted probabilities and confidence intervals in r).

Here is a subset of the data :

     Pos   Tot   Age
    <int> <int> <int>
1     1    11     1
2     0     1     1
3     3     3     1
4     1     2     1
5     5     7     1
47   13    16     4
48    9     9     4
49    9    10     4
50   14    14     4    
158   1     3     2
159   3     5     2
160   0     7     2
161   9    12     2
162   0     2     2
209   0     1     3
210   1     2     3
211   1     1     3
212   2     2     3

Each row represents a unique location. I removed location column to de-identify.

Here is my model:

 model1 <- glm(cbind(Tot - Pos, Pos) ~ -1+Age,
            family = binomial(link = "log"), data = data.frame)

My goal is to plot the predicted probabilities of different glm models for visual comparison...but for now I can't even figure out how to plot my simplest model.

Edit Because the response is a two-column matrix, I don't think there is a way to graph in ggplot. Can someone confirm?

I had tried to plot in ggplot, but due to the model response being a two-column matrix, the aesthetics of the plot and of the model did not match:

ggplot(data.frame, aes(x = Age, y = Pos/Tot)) +
geom_jitter(width = 0.05, height = 0.05) +
geom_smooth(method = glm, formula = cbind(Tot-Pos, Pos) ~ -1+Age, se = FALSE)

which returns a scatter plot of the observed values but also gives me the error message:

Warning message:
Computation failed in `stat_smooth()`:
object 'Tot' not found 

So I'm now trying to figure out how to plot using the predict function, which I've never done before.

This is what I have so far, adapting from here:

 newdata<-data.frame(Age = 1:4)
 plot(1:4, predict(model1, newdata, type="link"))

How do I add 95% confidence intervals and transform the data back to a probability scale of 0-1 on the y-axis?

Thanks very much

1
Formula edited. - Emma
It can be possible to use stat_smooth, but the straightforward way is to use the predict function to generate a data frame of probabilities, and then plot it as you would any other data frame. - Gregor Thomas
It would be helpful to see your best attempt with ggplot - it would let us know where your confusion is to help you learn what needs fixing. - Gregor Thomas
@Gregor best ggplot attempt added to my question. - Emma

1 Answers

0
votes

Here's how to generate the predictions:

pd = data.frame(Age = 1:4)

# use type = "response" for probability-scale predictions    
preds = predict(model1, newdata = pd, type = "response", se.fit = TRUE)
pd$fit = preds$fit
pd$se = preds$se.fit

And then plot:

ggplot(dd, aes(x = Age, y = Pos / Tot)) +
  geom_point(position = position_jitter(width = 0.05, height = 0.05)) +
  geom_ribbon(data = pd, aes(y = fit, ymin = fit - 1.96 * se, ymax = fit + 1.96 * se),
              fill = "blue", alpha = 0.3) +
  geom_line(data = pd, aes(y = fit)) 

enter image description here

From the plot, we can see that the model and plot are somewhat contradictory - this is because your model is specified as predicting the probability (Tot - Pos) / Pos, but your plot is showing the complement Pos / Tot, I'd recommend changing one to match the other.


Using this data:

dd = read.table(header = TRUE, text = "Pos   Tot   Age
1     1    11     1
2     0     1     1
3     3     3     1
4     1     2     1
5     5     7     1
47   13    16     4
48    9     9     4
49    9    10     4
50   14    14     4    
158   1     3     2
159   3     5     2
160   0     7     2
161   9    12     2
162   0     2     2
209   0     1     3
210   1     2     3
211   1     1     3
212   2     2     3")

And the model from your question:

model1 <- glm(cbind(Tot - Pos, Pos) ~ -1+Age,
        family = binomial(link = "log"), data = dd)