0
votes

The Background

I am using the plotly API in R to create two linked plots. The first is a scatter plot and the second is a bar chart that should show the percentage of data belonging to each category, in the current selection. I can't make the percentages behave as expected.

The problem

The plots render correctly and the interactive selection works fine. When I select a set of data points in the top scatter plot, I would like to see the percentage of that selection that belongs to each category. Instead what I see is the percentage of points in that selection in that category that belong to that category, in other words always 100%. I guess this is because I set color = ~c which applies a grouping to the category.

The Example

Here is a reproducible example to follow. First create some dummy data.

library(plotly)

n = 1000
make_axis = function(n) c(rnorm(n, -1, 1), rnorm(n, 2, 0.25))
data = data.frame(
  x = make_axis(n),
  y = make_axis(n),
  c = rep(c("A", "B"), each = n)
)

Create a sharedData object and supply it to plot_ly() for the base plot.

shared_data = data %>% 
  highlight_key()

baseplot = plot_ly(shared_data)

Make the individual panels.

points = baseplot %>% 
  add_markers(x = ~x, y = ~y, color = ~c)

bars = baseplot %>% 
  add_histogram(x = ~c, color = ~c, histnorm = "percent", showlegend = FALSE) %>% 
  layout(barmode = "group")

And put them together in a linked subplot with selection and highlighting.

subplot(points, bars) %>% 
  layout(dragmode = "select") %>% 
  highlight("plotly_selected") 

Here is a screenshot of this to illustrate the problem. enter image description here

An Aside

Incidentally when I set histnorm = "" in add_histogram() then I get closer to the expected behaviour but I do want percentages and not counts. When I remove color = ~c then I get closer to the expected behaviour but I do want the consistent colour scheme.

What have I tried

I have tried manually supplying the colours but then some of the linked selection breaks. I have tried creating a separate summarised data set from the sharedData object first and then plotting that but again this breaks the linkage between the plots.

If anyone has any clues as to how to solve this I would be very grateful.

1

1 Answers

1
votes

To me it seems the behaviour you are looking for isn't implemented in plotly.

Please see schema(): object ► traces ► histogram ► attributes ► histnorm ► description

However, here is the closest I was able to achive via add_bars and perprocessing the data (Sorry for adding data.table, you will be able to do the same in base R, just personal preference):

library(plotly)
library(data.table)

n = 1000
make_axis = function(n) c(rnorm(n, -1, 1), rnorm(n, 2, 0.25))
DT = data.table(
  x = make_axis(n),
  y = make_axis(n),
  c = rep(c("A", "B"), each = n)
)

DT[, grp_percent := rep(100/.N, .N), by = "c"]

shared_data = DT %>% 
  highlight_key()

baseplot = plot_ly(shared_data)
# Make the individual panels.

points = baseplot %>% 
  add_markers(x = ~x, y = ~y, color = ~c)

bars = baseplot %>% 
  add_bars(x = ~c, y = ~grp_percent, color = ~c, showlegend = FALSE) %>% 
  layout(barmode = "group")

subplot(points, bars) %>% 
  layout(dragmode = "select") %>% 
  highlight("plotly_selected")

Result

Unfortunately, the resulting hoverinfo isn't really desirable.