1
votes

I tried to run one example in the book of Python Data Science Essential. But, it appeared errors when I ran it. Actually, I just began learning the python. So, I felt that it is hard to fix those errors. Please help me. Here is code:

In:
import pandas as pd
import numpy as np
In: colors = list()
In: palette = {0: "red", 1: "green", 2: "blue"}
In:
for c in np.nditer(iris.target): colors.append(palette[int(c)])
    # using the palette dictionary, we convert
    # each numeric class into a color string
In: dataframe = pd.DataFrame(iris.data,
columns=iris.feature_names)
In: scatterplot = pd.scatter_matrix(dataframe, alpha=0.3,
figsize=(10, 10), diagonal='hist', color=colors, marker='o',
grid=True)

Here is errors:

ValueError Traceback (most recent call last) in () 1 scatterplot = pd.scatter_matrix(dataframe, alpha=0.3, ----> 2 figsize=(10, 10), diagonal='hist', color=colors, marker='o',grid=True)

/Users/leeivan/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.py in scatter_matrix(frame, alpha, figsize, ax, grid, diagonal, marker, density_kwds, hist_kwds, range_padding, **kwds) 378 379 ax.scatter(df[b][common], df[a][common], --> 380 marker=marker, alpha=alpha, **kwds) 381 382 ax.set_xlim(boundaries_list[j])

/Users/leeivan/anaconda/lib/python2.7/site-packages/matplotlib/init.pyc in inner(ax, *args, **kwargs) 1817
warnings.warn(msg % (label_namer, func.name), 1818
RuntimeWarning, stacklevel=2) -> 1819 return func(ax, *args, **kwargs) 1820 pre_doc = inner.doc 1821 if pre_doc is None:

/Users/leeivan/anaconda/lib/python2.7/site-packages/matplotlib/axes/_axes.pyc in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, **kwargs) 3787
facecolors = co 3788 if c is not None: -> 3789 raise ValueError("Supply a 'c' kwarg or a 'color' kwarg" 3790 " but not both; they differ but" 3791 " their functionalities overlap.")

ValueError: Supply a 'c' kwarg or a 'color' kwarg but not both; they differ but their functionalities overlap.

2
If you think appropriate, having provided both a resolution and an explanation to the problem, please tick the answer as addressing the question. Thanks!Enzo

2 Answers

9
votes

I tested the code below in jupyter and python 3.5 and it works.

import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
%matplotlib inline

iris = load_iris()
colors = list()
palette = {0: "red", 1: "green", 2: "blue"}

for c in np.nditer(iris.target): colors.append(palette[int(c)])
    # using the palette dictionary, we convert
    # each numeric class into a color string
dataframe = pd.DataFrame(iris.data,
columns=iris.feature_names)
scatterplot = pd.scatter_matrix(dataframe, alpha=0.3,
figsize=(10, 10), diagonal='hist', c=colors, marker='o', grid=True)

Clearly the parameter color is generating the error, while c is working. On the other hand it could be a bug in matplotlib.

This is what I found, looking at the pandas function:

def scatter_matrix(frame, alpha=0.5, figsize=None, ax=None, grid=False,
                   diagonal='hist', marker='.', density_kwds=None,
                   hist_kwds=None, range_padding=0.05, **kwds):
    """
    Draw a matrix of scatter plots.
    Parameters
    ----------
    frame : DataFrame
    alpha : float, optional
        amount of transparency applied
    figsize : (float,float), optional
        a tuple (width, height) in inches
    ax : Matplotlib axis object, optional
    grid : bool, optional
        setting this to True will show the grid
    diagonal : {'hist', 'kde'}
        pick between 'kde' and 'hist' for
        either Kernel Density Estimation or Histogram
        plot in the diagonal
    marker : str, optional
        Matplotlib marker type, default '.'
    hist_kwds : other plotting keyword arguments
        To be passed to hist function
    density_kwds : other plotting keyword arguments
        To be passed to kernel density estimate plot
    range_padding : float, optional
        relative extension of axis range in x and y
        with respect to (x_max - x_min) or (y_max - y_min),
        default 0.05
    kwds : other plotting keyword arguments
        To be passed to scatter function

So it appears that colors or c are passed to the scatter function in matplotlib as one of the **kwds in the function call.

This is the scatter function:

matplotlib.pyplot.scatter(x, y, s=20, c=None, marker='o', cmap=None, norm=None, vmin=None, vmax=None, alpha=None, linewidths=None, verts=None, edgecolors=None, hold=None, data=None, **kwargs)

Here the parameter is c and not color, but in other parts color is listed as an alternative to c (as you would expect).

I posted an issue on matplotlib. I will keep you informed.

NEWS as of 11/12/2016

After a bit of discussions, the bug has been accepted by pandas and scheduled for fixing in the next major release. See here on github

Basically when c is specified, c is sent to the scatter function in matplotlib. When color is specified, both c and color are sent, confusing matplotlib.

For the time been, as suggested, use c instead of color

-4
votes

So I'm on my phone at the moment and can't trace it through but here is what I can tell you. A kwarg is when you pass a key word argument to a function.

scatterplot = pd.scatter_matrix(dataframe, alpha=0.3,figsize=(10, 10),diagonal='hist',color=colors, marker='o',grid=True)

Right there color=colors is a keyword argument. Now somewhere in your function calls it looks like c becomes a keyword argument. I don't see how you can change that but you could get rid of your color Kwarg and that might fix things for now. Otherwise you need to look at that those functions in your stack trace and find out when c is becoming a kwarg