Creating a bar chart using bubbles-for-bars (geom_point) in ggplot2

Question

I am attempting to create a bar chart, except where the bars are replaced by circles stacked on top of each other. I have this dataset below of company values:

> dput(my.data)
structure(list(name = c("JUU", "Lyf", "Inf", "Coi", "Tan", "Rob", 
"Out", "Zen", "Com", "Pel", "Con", "Soc", "Ind", "Cro", "GRA", 
"Osc", "Zoo", "Kat", "Pro", "Nia", "Uni", "23a", "Ope", "Upt", 
"Qua", "Aff", "App", "Ava", "Gus", "Zoc", "Apt", "Spr", "red", 
"War", "Car", "Buz", "Quo", "Squ", "Afi", "Jet", "C3 ", "Hea", 
"Hum", "Nex", "STX", "Roc", "Avi", "Off", "Gin", "App", "Doc", 
"Rub", "Thu", "Zet", "Med", "Rub", "Clo", "Mar", "Kab", "Dra", 
"Vox", "Des", "Ada", "Age", "Ken", "SMS", "Sup", "Sym", "Zoo", 
"Par"), value = c(38, 15, 10, 8.05, 6.7, 5.6, 5.51, 4.5, 4.4, 
4.15, 4, 4, 3.45, 3.35, 3.2, 3.2, 3.2, 3, 3, 2.7, 2.6, 2.5, 2.47, 
2.3, 2.27, 2, 2, 2, 2, 2, 1.86, 1.81, 1.8, 1.75, 1.74, 1.7, 1.7, 
1.7, 1.6, 1.6, 1.51, 1.5, 1.5, 1.5, 1.5, 1.41, 1.4, 1.39, 1.38, 
1.35, 1.32, 1.3, 1.3, 1.3, 1.25, 1.23, 1.2, 1.2, 1.18, 1.07, 
1.07, 1.02, 1, 1, 1, 1, 1, 1, 1, 0.08), year = c(2017, 2015, 
2016, 2017, 2015, 2017, 2017, 2015, 2016, 2017, 2015, 2015, 2017, 
2017, 2017, 2015, 2016, 2017, 2016, 2017, 2016, 2015, 2016, 2015, 
2016, 2017, 2017, 2015, 2015, 2015, 2015, 2015, 2017, 2015, 2017, 
2015, 2017, 2017, 2017, 2016, 2017, 2017, 2016, 2015, 2016, 2017, 
2017, 2016, 2017, 2015, 2015, 2017, 2015, 2015, 2015, 2017, 2017, 
2015, 2015, 2015, 2015, 2017, 2015, 2016, 2016, 2016, 2017, 2017, 
2017, 2017)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-70L))

> head(my.data, 10)
# A tibble: 10 x 3
   name  value  year
   <chr> <dbl> <dbl>
 1 JUU   38     2017
 2 Lyf   15     2015
 3 Inf   10     2016
 4 Coi    8.05  2017
 5 Tan    6.7   2015
 6 Rob    5.6   2017
 7 Out    5.51  2017
 8 Zen    4.5   2015
 9 Com    4.4   2016
10 Pel    4.15  2017

The graph should have 3 bars, one for each year of 2015, 2016, and 2017. Each bar is comprised of circles of varying sizes, with the largest circle on the bottom, and the smallest on top. Using the value column, i compute the y values cumValues for these circles:

my.data <- my.data %>% 
  dplyr::arrange(desc(value)) %>% 
  dplyr::group_by(year) %>%
  # dplyr::mutate(cumValues = cumsum(valueEoy2018 ^ 0.5)) %>%
  dplyr::mutate(cumValues = cumsum(value)) %>%
  dplyr::ungroup()

> head(my.data %>% dplyr::filter(year == 2017))
# A tibble: 6 x 4
  name  value  year cumValues
  <chr> <dbl> <dbl>     <dbl>
1 JUU   38     2017      38  
2 Coi    8.05  2017      46.0
3 Rob    5.6   2017      51.6
4 Out    5.51  2017      57.2
5 Pel    4.15  2017      61.3
6 Ind    3.45  2017      64.8

... and lastly, i create the scatter plot:

  minValue = min(my.data$value)
  maxValue = max(my.data$value)
  valueRange = c(minValue, maxValue)
  my.data %>%
    ggplot() +
    geom_point(aes(x = year, y = cumValues, size = value),
               alpha = 0.95, pch = 21, fill = colorGold, color = 'black') +
    geom_text(aes(x = year, y = cumValues, label = ifelse(value > 5, name, '')),
              size = 3, fontface = 'bold', hjust = 0.4, vjust = 1.) +
    scale_size_continuous(range = valueRange)

...and received the following:

This is close to what I want, however I am struggling with 2 regards. First, and most importantly - the circles overlap too much. I want the bottom of one circle to touch the top of the circle below it. Or just a bit of overlap. But not nearly as much as is in the graph currently.

I've tried using different functions when computing the cumValues, and I've also tried using ggplots scale_size_continuous function, to no avail. I've also tried using scale_radius, but was unsuccessful with that as well.

Any help with this would be greatly appreciated, as I think this is a cool type of graph I am trying to build.

Interesting problem. I wonder if you can grab the coordinates from ggplot_build() and manipulate them to expand the y-axis? — Chase
scale_size_continuous(range = valueRange*0.35) gets closer with my settings. For an exact answer, you might consider `ggforce::geom_circle", but you'd have to stretch and relabel your x and/or y since currently your y range is much larger. — Jon Spring

Jon Spring Jon Spring · Accepted Answer · 2019-01-26T05:03:01

Here's an approach using ggforce::geom_circle to get precise control over the circle placement. The challenge I run into is that the original data has numeric height of 100+, but numeric width of only 2 (2015 to 2017), but ggforce::geom_circle creates a circle that will be proportional to the coordinates. So if we leave x and y as-is, you'll get a very tall and narrow chart, or else you'll get very squished circles. My hack is to scale the values from the start. (And I use their square root so that the values are scaled to the areas and not the radii.)

I wasn't sure if the y values would be used in the final chart. If you can drop them, then this should suffice, but if you need them then you could either manually change the labels on the y breaks or use a labeller to get them to display with the original scale.

my.data <- my.data %>% 
  dplyr::arrange(desc(value)) %>% 
  dplyr::group_by(year) %>%
  dplyr::mutate(value_sqrt = sqrt(value/100),
                cum_value_sqrt = cumsum(value_sqrt),
                height = cum_value_sqrt - value_sqrt/2) %>%
  dplyr::ungroup()

my.data %>%
  ggplot() +
  ggforce::geom_circle(aes(x0 = year, 
                           y0 = height, 
                           r = value_sqrt/2),
             alpha = 0.95, fill = "gold", color = 'black') +
  geom_text(aes(x = year, y = height, label = ifelse(value > 5, name, '')),
            size = 3, fontface = 'bold', hjust = 0.4, vjust = 1) +
  scale_x_continuous(breaks = 2015:2017, minor_breaks = F) +
  coord_equal(ratio = 1)

Creating a bar chart using bubbles-for-bars (geom_point) in ggplot2

1 Answers