0
votes

I have a dataset with 17 questions (Q1 - Q17) and a categorical variable (Region).

> df[, c("Region", QUESTIONS)]
# A tibble: 963 x 18
   Region     Q1    Q2    Q3    Q4    Q5    Q6    Q7    Q8    Q9   Q10   Q11   Q12   Q13   Q14   Q15
   <chr>   <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
 1 USA         0     1     0     0     0     0     0     0     0     0     0     0     0     0     0
 2 USA         8     8     8     8     6     8     8     0     5    10     7     0     0    10     8
 3 USA         9     8     7    10     8     4     8     0     5     8     8     8     2     7     6
 4 USA         4     2     5     4     3     3     2     0     1     0     0     0     3     2     0
 5 USA         2     6     7     5     6     2     9     0     6     7     3     0     0     8     5
 6 USA         6     6     8     1     2     0     4     0     0     4     0     6    10     0     1
 7 USA         5     2     7     8    10     9    10     8     6    10     1    10     4     6    10
 8 IE          6     6     5     5     6     5     6     3     6     7     6     6     7     7     4
 9 OCEANIA     8     8     6    10     5    10     5     1    10     4     0     1    10     9    10
10 USA         3     2     2     7     3     1     2     0     8     3     3     1     0     8     8
# ... with 953 more rows, and 2 more variables: Q16 <int>, Q17 <int>

I want to compare answers across regions, so I first melt df and then create a boxplot using ggplot.

df1 <- melt(df[, c("Region", QUESTIONS)])
ggplot(data=df1, aes(x=variable, y=value, fill=Region)) + geom_boxplot()

Unfortunately, with 17 questions and 13 regions, the boxplot is incredibly busy and virtually incomprehensible. How can I simplify it (say plot only the mean and +/-1 standard error) so that it is legible. Alternatively, how can i generate 17 sets of boxplots (One per question, and I do need all 17 questions) on each of which the 13 regions will be visible?

Sincerely

Thomas Philips

1

1 Answers

1
votes

You want to use facet_wrap() perhaps. Here I use some simplified fake data to give you the idea.

library(dplyr)
library(tidyr)
library(ggplot2)
set.seed(12234)
df <- data.frame(Region = sample(LETTERS[1:10], 100, TRUE),
                 Q1 = rpois(100, 4),
                 Q2 = rpois(100, 3),
                 Q3 = round(runif(100, 1, 10)),
                 Q4 = round(runif(100, 1, 10)),
                 Q5 = round(10 * rnorm(100)))
df %>% pivot_longer(cols = -Region, names_to = "Question", values_to = "Value") %>%
  ggplot() +
  geom_boxplot(aes(x = Region, y = Value, fill = Region)) + 
  facet_wrap("Question")

faceted boxplots