9
votes

I am trying to use Bokeh to plot a streaming dataset within a Jupyter notebook. Here is what I have so far.

From the command line I start the bokeh server by running the command

$> bokeh server

Here is the code from my Jupyter notebook

import numpy as np
from IPython.display import clear_output
# ------------------- new cell ---------------------#

from bokeh.models.sources import ColumnDataSource
from bokeh.client import push_session
from bokeh.driving import linear
from bokeh.plotting import figure
from bokeh.io import curdoc, output_notebook, show
# ------------------- new cell ---------------------#

output_notebook()
# ------------------- new cell ---------------------#

my_figure = figure(plot_width=800, plot_height=400)
test_data = ColumnDataSource(data=dict(x=[0], y=[0]))
linea = my_figure.line("x", "y", source=test_data)
# ------------------- new cell ---------------------#

new_data=dict(x=[0], y=[0])
x = []
y = []

step_size = 0.1  # increment for increasing step
@linear(m=step_size, b=0)
def update(step):
    x.append(step)
    y.append(np.random.rand())
    new_data['x'] = x
    new_data['y'] = y

    test_data.stream(new_data, 10)

    clear_output()
    show(my_figure)

    if step > 10: 
        session.close()    
# ------------------- new cell ---------------------#

# open a session to keep our local document in sync with server
session = push_session(curdoc())

period = 100  # in ms
curdoc().add_periodic_callback(update, period)

session.show()  # open a new browser tab with the updating plot

session.loop_until_closed()

Currently, the result I get is a flashing plot within the Jupyter notebook and also a nicely updating plot in a new browser tab. I would like either of the following

  • a nicely updating plot in Jupyter, without the flashing
  • just the plot in the new browser tab

I tried removing show(my_figure) but each update opened a new tab. I also tried reducing the refresh rate to 10 ms, period = 10; session.show() works great but the notebook eventually crashes because it cannot refresh that fast.

How do I get a good refresh rate of the bokeh plot in Jupyter? Or how do I turn off the Jupyter plot and only have one tab showing the updating plot?

2
For my (slow) Windows computer the 'show(my_figure)' plot is even more skewed.. Even at rather slow refresh rates (500ms+) it won't update correctly.. I have to leave/come back to the tab in order to refresh the x-axis, points disappear from the plot. (Thx for the downvote earlier, please keep in mind that new users CANNOT commend!)Markus Bleuel
On a MacBook Pro I get similar behavior to what you describe above, though with the 100 ms refresh rate, the plot in Jupyter cannot render quickly enough, no line is visible. After adjusting the period to 500 the Jupyter plot would show up, just scrolling up every time it updated making it hard to see.Steven C. Howell
A note from @bigreddot: The answers are either push_notebook or, with 0.12.5, [bokeh server app embedded in notebook](https://github.com/bokeh/bokeh/blob/master/examples/howto/server_embed/notebook_embed.ipynb). You are currently recreating or reshowing every plot in its entirety, which will give poor results. Also, using bokeh.client` doubles the amount of network traffic.Steven C. Howell

2 Answers

13
votes

Here is the code for a modified notebook, following @bigreddot's comment, which uses push_notebook to produce a much cleaner result within the notebook (it does not require you run bokeh serve for the plotting). It does not use a callback; I'm not sure if this is an advantage or not. As is, if you want the plot to update when a new data point comes in you could add an if data_event: statement at the beginning of the while loop, then tune the sleep time according to work well with the event rate.

This page from the official documentation provides additional helpful information regarding using Bokeh in a Jupyter notebook.

import time
import numpy as np
# ------------------- new cell ---------------------#

from bokeh.models.sources import ColumnDataSource
from bokeh.plotting import figure
from bokeh.io import output_notebook, show, push_notebook
# ------------------- new cell ---------------------#

output_notebook()
# ------------------- new cell ---------------------#

my_figure = figure(plot_width=800, plot_height=400)
test_data = ColumnDataSource(data=dict(x=[0], y=[0]))
line = my_figure.line("x", "y", source=test_data)
handle = show(my_figure, notebook_handle=True)

new_data=dict(x=[0], y=[0])
x = []
y = []

step = 0
step_size = 0.1  # increment for increasing step
max_step = 10  # arbitrary stop point for example
period = .1  # in seconds (simulate waiting for new data)
n_show = 10  # number of points to keep and show
while step < max_step:
    x.append(step)
    y.append(np.random.rand())
    new_data['x'] = x = x[-n_show:]  # prevent filling ram
    new_data['y'] = y = y[-n_show:]  # prevent filling ram

    test_data.stream(new_data, n_show)

    push_notebook(handle=handle)
    step += step_size
    time.sleep(period)

Note the addition of new_data['x'] = x = x[-n_show] (same for y) so this could in theory run indefinitely without filling your memory. Also, it would be nice to actually stream some kind of data source (maybe from the web) to make this a more realistic example. Lastly, you probably realize this, but after you run the cell with the streaming plot, the kernel will be locked until it completes or is interupted; you cannot execute additional cells/code. If you want to have analysis/control features, they should go inside the while loop.

1
votes

@Steven C. Howell

Inspired by your example, I modified it by using a non-blocking callback function. It is not using add_periodic_callback, since this feature does not work in jupyter notebooks (mentioned in the Bokeh documention). However it might be useful to be able to do non-blocking data streaming when working with jupyter notebooks.

import time
import numpy as np
# ------------------- new cell ---------------------#

from bokeh.models.sources import ColumnDataSource
from bokeh.plotting import figure
from bokeh.io import output_notebook, show, push_notebook
# ------------------- new cell ---------------------#

output_notebook()
# ------------------- new cell ---------------------#

my_figure = figure(plot_width=800, plot_height=400)
test_data = ColumnDataSource(data=dict(x=[0], y=[0]))
line = my_figure.line("x", "y", source=test_data)
handle = show(my_figure, notebook_handle=True)
# ------------------- new cell ---------------------#

from threading import Thread

stop_threads = False

def blocking_callback(id, stop):
    new_data=dict(x=[0], y=[0])

    step = 0
    step_size = 0.1  # increment for increasing step
    max_step = 10  # arbitrary stop point for example
    period = .1  # in seconds (simulate waiting for new data)
    n_show = 10  # number of points to keep and show

    while True:

        new_data['x'] = [step]
        new_data['y'] = [np.random.rand()]

        test_data.stream(new_data, n_show)

        push_notebook(handle=handle)
        step += step_size
        time.sleep(period)

        if stop():
            print("exit")
            break

thread = Thread(target=blocking_callback, args=(id, lambda: stop_threads))
thread.start()

This has the advantage, that infinite data streaming is not blocking subsequent cells from executing:

# ------------------- new cell ---------------------#

# preceding streaming is not blocking
for cnt in range(10):
    print("Do this, while plot is still streaming", cnt)

# ------------------- new cell ---------------------#

# you might also want to stop the thread
stop_threads=True
del thread