3
votes

Let me just start by complementing the HoloViews developers, this thing is pretty amazing. There are just a lot of pieces and it is a bit hard to figure out how to put them all together to do what I want:).

I am trying here to do linked multidimensional data plotting, i.e. I want to have several plots showing views of the same data along various dimensions. I want then to harness the Bokeh selection tools to select data in one of the plots, and see where it is in the others. But I also need to use Datashader to do it, because my datasets are large.

This what I have so far (running in a Jupyter notebook, with python 2)

import numpy as np
import pandas as pd
import holoviews as hv
import holoviews.operation.datashader as hvds
hv.notebook_extension('bokeh')
%opts Scatter [tools=['box_select', 'lasso_select']] (size=10 nonselection_color='red' color='blue') Layout [shared_axes=True shared_datasource=True]

# Create some data to plot
x1 = np.arange(0,10,1e-2)
x2 = np.arange(0,10,1e-2)
X1,X2 = np.meshgrid(x1,x2)
x1 = X1.flatten()
x2 = X2.flatten()
x3 = np.sin(x1) * np.cos(x2)
x4 = x1**2 + x2**2

# Pandas dataframe object from the data 
print "Creating Pandas dataframe object"
df = pd.DataFrame.from_dict({"x1": x1, "x2": x2, "x3": x3, "x4": x4})

# Put the dataframe into a HoloViews table
dtab = hv.Table(df)

# Make some linked scatter plots using datashader
scat1 = dtab.to.scatter('x1', 'x2', [])
scat2 = dtab.to.scatter('x1', 'x3', [])
scat3 = dtab.to.scatter('x2', 'x4', [])
hvds.datashade(scat1) + hvds.datashade(scat2) + hvds.datashade(scat3)

This produces the following

enter image description here

which is pretty fantastically simple. However it doesn't quite do what I want. The changes of data ranges and panning are linked, which is very cool, however data outside the range of one plot still can get plotted on the others. I would like to have that data disappear from all plots, so that I only see the data that falls within all the viewed data ranges, so that one can dynamically select some hypercube of data to highlight in the multidimensional space.

In addition, it would be good to have the Bokeh selection tools work the same way, so that for instance I could select some points on one plot and have them all show up in red or something on the other plots. I am not even getting the selection tools at all though, despite asking for 'box_select' and 'lasso_select'. I probably asked for them incorrectly though, it is not really clear to me how HoloViews passes options around.

2

2 Answers

4
votes

You can use HoloViews Streams to select the data to display using only the currently visible points. There's an example at: https://anaconda.org/petrenko/linking_datashaders

4
votes

Working from James' answer (https://stackoverflow.com/a/44288019/1447953) I extended the example in the question to the following. It takes one plot as the "master" control source, and plots only data that appears within the data ranges of that plot onto a bunch of "slave" plots. It would be nice to have a dual-way relationship, but this is pretty cool as it is.

import numpy as np
import pandas as pd
import holoviews as hv
import holoviews.operation.datashader as hvds
hv.notebook_extension('bokeh')
%opts Layout [shared_axes=False shared_datasource=True]

# Create some data to plot
x1 = np.arange(0,10,1e-2)
x2 = np.arange(0,10,1e-2)
X1,X2 = np.meshgrid(x1,x2)
x1 = X1.flatten()
x2 = X2.flatten()
x3 = np.sin(x1) * np.cos(x2)
x4 = x1**2 + x2**2

# Pandas dataframe object from the data 
print "Creating Pandas dataframe object"
df = pd.DataFrame.from_dict({"x1": x1, "x2": x2, "x3": x3, "x4": x4})

# Make some linked scatter plots using datashader
x1_x2 = hv.Points(df[['x1', 'x2']])
#x1_x3 = hv.Points(df[['x1', 'x3']])
#x2_x4 = hv.Points(df[['x2', 'x4']])

from holoviews import streams

maindata=x1_x2
mainx='x1'
mainy='x2'
def create_dynamic_map(xvar,yvar):
    def link_function(x_range, y_range):
        x_min = x_range[0]; x_max = x_range[1]
        y_min = y_range[0]; y_max = y_range[1]
        pts = hv.Points(df[  (getattr(df,mainx) > x_min) & (getattr(df,mainx) < x_max) 
                           & (getattr(df,mainy) > y_min) & (getattr(df,mainy) < y_max) 
                          ][[xvar, yvar]])
        return pts
    dmap = hv.DynamicMap(link_function, 
                     streams=[hv.streams.RangeXY(x_range=(-100,100), 
                                                 y_range=(-100,100), 
                                                 source=maindata)],
                     kdims=[])
    return dmap

x1_x3 = create_dynamic_map('x1','x3')
x2_x4 = create_dynamic_map('x2','x4')

hvds.datashade(x1_x2) + hvds.datashade(x1_x3) + hvds.datashade(x2_x4)