2
votes

My data is binary with two linear independent variables. For both predictors, as they get bigger, there are more positive responses. I have plotted the data in a heatplot showing density of positive responses along the two variables. There are the most positive responses in the top right corner and negative responses in the bottom left, with a gradient change visible along both axes.

I would like to plot a line on the heatplot showing where a logistic regression model predicts that positive and negative responses are equally likely. (My model is of the form response~predictor1*predictor2+(1|participant).)

My question: How can I figure out the line based on this model at which the positive response rate is 0.5?

I tried using predict(), but that works the opposite way; I have to give it values for the factor rather than giving the response rate I want. I also tried using a function that I used before when I had only one predictor (function(x) (log(x/(1-x))-fixef(fit)[1])/fixef(fit)[2]), but I can only get single values out of that, not a line, and I can only get values for one predictor at a time.

1
Interesting question! It would be easier for people to help if you provide example data though, e.g. a similar logistic regression model fitted to one of the example datasets in R and the code for your plot. I think you can find the boundary line by finding the points where the fitted linear predictor (on the log odds scale) is 0, so you can use some pretty basic algebra to find an equation for the value of predictor2 that will satisfy that given some value of predictor1.Marius
In fact, I've just found someone has written up the algebra here: stats.stackexchange.com/a/159977/5443Marius

1 Answers

2
votes

Using a simple example logistic regression model fitted to the mtcars dataset, and the algebra described here, I can produce a heatmap with a decision boundary using:

library(ggplot2)
library(tidyverse)
data("mtcars")

m1 = glm(am ~ hp + wt, data = mtcars, family = binomial)

# Generate combinations of hp and wt across their observed range. Only
#   generating 50 values of each here, which is not a lot but since each
#   combination is included, you get 50 x 50 rows 
pred_df = expand.grid(
    hp = seq(min(mtcars$hp), max(mtcars$hp), length.out = 50),
    wt = seq(min(mtcars$wt), max(mtcars$wt), length.out = 50)
)
pred_df$pred_p = predict(m1, pred_df, type = "response")


# For a given value of hp (predictor1), find the value of
#   wt (predictor2) that will give predicted p = 0.5
find_boundary = function(hp_val, coefs) {
    beta_0 = coefs['(Intercept)']
    beta_1 = coefs['hp']
    beta_2 = coefs['wt']
    boundary_wt =  (-beta_0 - beta_1 * hp_val) / beta_2
}


# Find the boundary value of wt for each of the 50 values of hp
# Using the algebra in the linked question you can instead find
# the slope and intercept of the boundary, so you could potentially
# skip this step
boundary_df = pred_df %>%
    select(hp) %>%
    distinct %>%
    mutate(wt = find_boundary(hp, coef(m1)))


ggplot(pred_df, aes(x = hp, y = wt)) +
    geom_tile(aes(fill = pred_p)) +
    geom_line(data = boundary_df)

Producing:

enter image description here

Note that this only takes into account the fixed effects from the model, so if you want to somehow take into account random effects this could be more complex.