1
votes

Newer to using R and ggplot2 for my data analysis. Trying to figure out how to turn my data from R into the ggplot2 format. The data is a set of values for 5 different categories and I want to make a stacked bar graph that allows me to section the stacked bar graph into 3 sections based on the value. Ex. small, medium, and large values based on arbitrary cutoffs. Similar to the 100% stacked bar graph in excel where the proportion of all the values adds up to 1 (on the y axis). There is a fair amount of data (~1500 observations) if that is also a valuable thing to note.

here is a sample of what the data looks like (but it has approx 1000 observations for each column) (I put an excel screenshot because I don't know if that worked below)

dput(sample-data)

similar to this image but the proportions are specific to the arbitrary data cutoffs and there are only 3 of them

2
Hi there and welcome to SO. Take a look at How to Ask for hints. It's a good start to give some data, make a great reproducible example.Martin Gal
The cutoffs are equal for all categories?Rui Barradas
@RuiBarradas yes! It would be the same 2 cutoffs for all the different categories.Eryn Bugbee
That's and image, not the output of dput. But anyway my comment to my answer should work.Rui Barradas

2 Answers

2
votes

This sort of problem is usually a data reformating problem. See reshaping data.frame from wide to long format.
The following code uses built-in data set iris, with 4 numeric columns, to plot a bar graph with the data values cut into levels after reshaping the data.

I have chosen cutoff points 0.2 and 0.7 but any other numbers in (0, 1) will do. The cutoff vector is brks and levels names labls.

library(tidyverse)

data(iris)

brks <- c(0, 0.2, 0.7, 1)
labls <- c('Small', 'Medium', 'Large')

iris[-5] %>%
  pivot_longer(
    cols = everything(),
    names_to = 'Category',
    values_to = 'Value'
  ) %>%
  group_by(Category) %>%
  mutate(Value = (Value - min(Value))/diff(range(Value)),
         Level = cut(Value, breaks = brks, labels = labls, 
                     include.lowest = TRUE, ordered_result = TRUE)) %>%
  ggplot(aes(Category, fill = Level)) +
  geom_bar(stat = 'count', position = position_fill()) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

enter image description here

0
votes

Here's a solution requiring no data reformating.

The diamonds dataset comes with ggplot2. Column "color" is categorical, column "price" is numeric:

library(ggplot)

ggplot(diamonds) + 
    geom_bar(aes(x = color, fill = cut(price, 3, labels = c("low", "mid", "high"))),
             position = "fill") +
    labs(fill = "price")

enter image description here