I want to run a multinomial regression to get the average frequency of each choice of a close question, split by a factor (gender: male/female).
Background
I want to compare 4 types of cheeses to measure each one's popularity, out of 4 possibilities: cheddar, mozzarella, gouda, and brie. I go out and ask 200 people for their preferred cheese. Each person selects only one choice out of the 4 types. I end up collecting some demographics information as well, including gender, age, and weight.
Upon finishing data collection, I want to see the proportion of popularity of each cheese type (together they sum to 100%). Since I want to control for gender
, age
, and weight
, I think that a multinomial regression is suitable here.
But I'm also very interested in seeing how results differ between males and females, and I want to include gender
in my model as a factor. How can I generate a twofold prediction based on my (multinomial) model, that would get predicted values for females and males separately so I could compare between the two gender levels?
Data
library(truncnorm)
library(tidyverse)
set.seed(999)
cheese_df <-
tibble(
age = round(rtruncnorm(
n = 200,
a = 20,
b = 80,
mean = 25,
sd = 25.09
)),
cheese_response = as_factor(sample(
c("cheddar", "mozzarella", "gouda", "brie"),
size = 200,
replace = TRUE
)),
gender = sample(c(0, 1), size = 200, replace = TRUE),
weight = rtruncnorm(
n = 200,
a = 40,
b = 120,
mean = 70,
sd = 25.09
)
)
> cheese_df
## # A tibble: 200 x 4
## age cheese_response gender weight
## <dbl> <fct> <dbl> <dbl>
## 1 45 cheddar 0 62.2
## 2 32 cheddar 0 45.0
## 3 58 cheddar 1 87.6
## 4 28 brie 0 68.8
## 5 49 gouda 0 88.2
## 6 29 brie 1 74.5
## 7 49 cheddar 0 74.0
## 8 27 gouda 1 90.3
## 9 28 brie 0 56.5
## 10 48 mozzarella 0 72.9
## # ... with 190 more rows
If I just want to run a multinomial regression and control for age, gender, and weight **without splitting by gender** I could do:
library(nnet)
library(effects)
fit <- nnet::multinom(cheese_response ~ age + gender + weight, data = cheese_df)
average_person_for_control <-
c(
age = 50,
gender = 0.5,
weight = 75
)
prediction <-
effects::Effect("age",
fit,
given.values = average_person_for_control,
xlevels = list(age =
c(45, 90)))
proportions_for_plot <-
data.frame(prediction$prob, prediction$lower.prob, prediction$upper.prob) %>%
slice(1) %>%
pivot_longer(., cols = everything(),
names_to = c(".value", "response"),
names_pattern = "(.*)\\.(.*$)") %>%
rename("lower_ci" = "L.prob",
"upper_ci" = "U.prob",
"estimate" = "prob")
ggplot(proportions_for_plot, aes(x = reorder(response, -estimate), y = estimate)) +
geom_bar(stat = "identity", width = 0.7, fill = "darkgreen") +
geom_errorbar(aes(ymin = lower_ci, ymax = upper_ci),
width = 0.2) +
geom_text(aes(label = paste0(100*round(estimate,2), "%")),
vjust = 1.6,
color = "white", size = 3) +
xlab("cheese type") +
ylab("proportion of people choosing this type")
However, I'm interested in generating the same bar plot only that it would split bars male vs female
This is the kind of plot I'm trying to get
(ignore the values in this demo)
One approach would be to subset the data by gender, run the same model on each subset, generate two barplots and unite them. However, I want to incorporate gender
within the model as a factor, and only then output the split bar plot. This is partially taken care of, as gender
is already part of the model:
fit <- nnet::multinom(cheese_response ~ age + gender + weight, data = cheese_df)
.
Still, as far as splitting the predictions by gender, in order to compare them side-by-side in a bar plot, I run into trouble. This is because effects::Effect()
accepts only a vector into its given.values
argument. Otherwise, I would have done something like the following to feed the prediction (like I would do if I were to use predict
):
control_by_gender <-
expand.grid(
age = 50,
weight = 75,
gender = c(0, 1)
)
> control_by_gender
## age weight gender
## 1 50 75 0
## 2 50 75 1
Any idea how I could get such a multiple -- rather than focal -- prediction when working on a multinomial model object as seen above? My ultimate goal is the split-by-gender bar plot, like the demo above. I've been using Effects::effect
for generating the prediction but am open to any substitute that could do the multiple prediction trick.
slice(1)
? It seems like you are just getting rid of one age category. – andrew_reeceslice(1)
is becausexlevels
takes 2 values forage
:45
and90
. However, it's a workaround since I'm not interested in90
at all. But try including only a single value forxlevels
(e.g.45
) and see the messy output it gives. So I ended up including 2 values forxlevels
only to get rid of one later (byslice
). If you know a more elegant method I'll be grateful. The only reason I useeffects::Effect
is to get a focal prediction given single values forage
,weight
, andgender
. On top of that comes the current post of generating 2 predictions bygender
. – Emman