2
votes

So, I have data in pandas dataframe, where row names are given in datetime pandas.tseries. I can plot the data in matplotlib and I get this figure:

enter image description here

however, I want to use plotly to draw the same graph in inetarctive mode. It works follows, but it doesn't show the datetime, instead it replaces the x-axis with integer indexing!

https://plot.ly/~vmirjalily/5/

The figure in the URL above is plotted using this code:

dfmean = df.mean(axis=1)
dfmean_mavg = pd.rolling_mean(dfmean, 50)

dfmean.plot(linewidth=1.5, label='Mean of 20')
dfmean_mavg.plot(linewidth=3, label='Moving Avg.')
#plt.legend(loc=2)

l1 = plt.plot(dfmean, 'b-', linewidth=3)
l2 = plt.plot(dfmean_mavg, 'g-', linewidth=4)

mpl_fig1 = plt.gcf()

py.iplot_mpl(mpl_fig1, filename='avg-price.20stocks')

but this code doesn't show the datetime index in the x-axis. I tried to force the datetime index as below:

l1 = plt.plot(np.array(dfmean.index), dfmean, 'b-', linewidth=3)
l2 = plt.plot(np.array(dfmean_mavg.index), dfmean_mavg, 'g-', linewidth=4)

mpl_fig1 = plt.gcf()

py.iplot_mpl(mpl_fig1, filename='avg-price.20stocks')

but it gave a long list of errors as below

:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-35-4a3ca217202d> in <module>()
     14 mpl_fig1 = plt.gcf()
     15 
---> 16 py.iplot_mpl(mpl_fig1, filename='avg-price.20stocks')

/usr/local/lib/python2.7/dist-packages/plotly/plotly/plotly.pyc in iplot_mpl(fig, resize, strip_style, update, **plot_options)
    257             "object. Run 'help(plotly.graph_objs.Figure)' for more info."
    258         )
--> 259     return iplot(fig, **plot_options)
    260 
    261 

/usr/local/lib/python2.7/dist-packages/plotly/plotly/plotly.pyc in iplot(figure_or_data, **plot_options)
    113     if 'auto_open' not in plot_options:
    114         plot_options['auto_open'] = False
--> 115     res = plot(figure_or_data, **plot_options)
    116     urlsplit = res.split('/')
    117     username, plot_id = urlsplit[-2][1:], urlsplit[-1]  # TODO: HACKY!

/usr/local/lib/python2.7/dist-packages/plotly/plotly/plotly.pyc in plot(figure_or_data, validate, **plot_options)
    212                 pass
    213     plot_options = _plot_option_logic(plot_options)
--> 214     res = _send_to_plotly(figure, **plot_options)
    215     if res['error'] == '':
    216         if plot_options['auto_open']:

/usr/local/lib/python2.7/dist-packages/plotly/plotly/plotly.pyc in _send_to_plotly(figure, **plot_options)
    971     fig = tools._replace_newline(figure)  # does not mutate figure
    972     data = json.dumps(fig['data'] if 'data' in fig else [],
--> 973                       cls=utils._plotlyJSONEncoder)
    974     username, api_key = _get_session_username_and_key()
    975     kwargs = json.dumps(dict(filename=plot_options['filename'],

/usr/lib/python2.7/json/__init__.pyc in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, encoding, default, **kw)
    236         check_circular=check_circular, allow_nan=allow_nan, indent=indent,
    237         separators=separators, encoding=encoding, default=default,
--> 238         **kw).encode(obj)
    239 
    240 

/usr/lib/python2.7/json/encoder.pyc in encode(self, o)
    199         # exceptions aren't as detailed.  The list call should be roughly
    200         # equivalent to the PySequence_Fast that ''.join() would do.
--> 201         chunks = self.iterencode(o, _one_shot=True)
    202         if not isinstance(chunks, (list, tuple)):
    203             chunks = list(chunks)

/usr/lib/python2.7/json/encoder.pyc in iterencode(self, o, _one_shot)
    262                 self.key_separator, self.item_separator, self.sort_keys,
    263                 self.skipkeys, _one_shot)
--> 264         return _iterencode(o, 0)
    265 
    266 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,

/usr/local/lib/python2.7/dist-packages/plotly/utils.pyc in default(self, obj)
    144                 if s is not None:
    145                     return s
--> 146             raise e
    147         return json.JSONEncoder.default(self, obj)
    148 

TypeError: masked is not JSON serializable

Here is my package versions:

IPython 2.0.0
numpy 1.9.0
numexpr 2.2.2
pandas 0.15.0
matplotlib 1.4.0
plotly 1.4.7

And the first 10 lines of my dataframe:

Date
2011-01-04    54.2430
2011-01-05    54.3935
2011-01-06    54.4665
2011-01-07    54.5920
2011-01-10    54.9435
2011-01-11    54.9340
2011-01-12    55.4755
2011-01-13    55.5495
2011-01-14    56.0230
dtype: float64
1

1 Answers

3
votes

There are a couple things going on here.

The traceback:

This traceback is telling you that you can't serialize masked numbers. Masked numbers are slightly different than NaN. Here's a bit of info if you're curious: http://pandas.pydata.org/pandas-docs/dev/gotchas.html#nan-integer-na-values-and-na-type-promotions

The reason you have masked numbers is the moving average calculation you do. It makes the first N values, where N is the number of points you're averaging over, masked.

Therefore, if you get rid of the masked values by manipulating the data frame, you wouldn't see that issue any more.

Taking a queue from what DataFrame.to_json() does with masked values (turns them to null), the most appropriate value to replace with in your list would be None if you try to go down that road. None translates best to null.

The integer's on the x axis

A bit of background. When dates are in matplotlib, they are floating-point values representing the number of days since 0001-01-01 + 1, (see matplotlib dates for more info). However, importing pandas will alter this to use a different date representation, the number of days since the unix epoch, another floating point number. Version 1.4.7 in plotly was meant to handle both discrepancies by converting back to an ISO string, but perhaps there's another avenue that you've found. I can't seem to recreate this error on my end though. Here's the code I tried:

import random
import pandas as pd
import matplotlib.pyplot as plt
import plotly.plotly as py
import plotly.tools as tls
num_pts = 1000
data = [random.random() for i in range(num_pts)]
index = pd.date_range('2011-01-04', periods=num_pts)
df = pd.DataFrame(data=data, index=index)
dfmean = df.mean(axis=1)
dfmean_mavg = pd.rolling_mean(dfmean, 50)
dfmean.plot(linewidth=1.5, label='Mean of 20')
# dfmean_mavg.plot(linewidth=3, label='Moving Avg.')

mpl_fig1 = plt.gcf()
py.plot_mpl(mpl_fig1, filename='avg-price.20stocks')

Calling plt.plot on the series

It looks like you try to plot the portions of your data twice? I'm more familiar with calling the plot method directly on a data frame, which is why I chose to only include this version in the code snippet above.

TL;DR, just fix it.

There's a PR open on Plotly's python api GH repo to handle this: https://github.com/plotly/python-api/pull/159. It should be up on PyPi tomorrow.