3
votes

I am making a pd.scatter_matrix() plot from a DataFrame based on the Iris dataset colored by the target variable (plant species). When I run the code below I get a scatter matrix with black, grey and white (!) colored scattering points which hinders visualization. The grid seems inconsistent too, apparently only the plots close to the axis get the respective gridding. I wanted a nice grid and scatter matrix following the sns default color palette (blue, green, red).

Why is seaborn plot style and the use of pd.scatter_matrix() enforcing a different (awful!) color palette then the defaults for the scatter plots and inconsistent grid lines? How can I solve these visualization issues?

I already updated seaborn to a fairly recent version (0.8 of July 2017). Also tried the non-deprecated version the scatter_matrix plot for pandas pd.plotting.scatter_matrix() and had no luck. If I use the 'ggplot' style the color palette is correct for the scatter plots but the grids are still inconsistent.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('seaborn')
from sklearn import datasets

iris = datasets.load_iris()
X = iris.data
y = iris.target
df = pd.DataFrame(X, columns = iris.feature_names)

pd.scatter_matrix(df, c=y, figsize = [8,8],
                      s=80, marker = 'D');

enter image description here

Package versions:

pandas version: 0.20.1
matplotlib version: 2.0.2
seaborn version:0.8.0

2
I guess pandas scatter matrix is not the best choice when it comes to styling. Are you aware of seaborn's paigrid?ImportanceOfBeingErnest
I was not. Thanks for pointing it out it is much better than the pandas solution. I am going to use pairgrid from now on. Python visualization landscape seems to be full of pitfalls...Francio Rodrigues
@franciobr Would you clarify please what is your problem exactly? Default/seaborn matplotlib's aesthetics or something else?Sergey Bushmanov
@SergeyBushmanovm thanks for helping out. The plot looks awful and is nothing like the default aesthetics of seaborn. I don't know from which color palette the scatter plots are getting the black/grey/white dots from. It is not from the default seaborn (blue, green, red) or matplotlib palette and the grid lines are buggy. I was hoping someone could point out some mistake on the way I am using pd.scatter_matrix() but I guess the takeaway is that pd.scatter_matrix screws up the style of the plots and one should use other functions such as sns.pairgrid instead.Francio Rodrigues

2 Answers

7
votes

I am not sure if this answers your question but you could use the pairplot. let me know..

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

from sklearn import datasets

iris = datasets.load_iris()
X = iris.data
y = iris.target
df = pd.DataFrame(X, columns = iris.feature_names)

pd.plotting.scatter_matrix(df, c=y, figsize = [8,8],
                      s=80, marker = 'D');
df['y'] = y

sns.pairplot(df,hue='y')

which gives you:

enter image description here

If you want to avoid that the last line of the visualizations then:

import seaborn as sns
sns.set(style="ticks", color_codes=True)
iris = sns.load_dataset("iris")
%matplotlib inline

iris = sns.load_dataset("iris")
sns.pairplot(iris, hue="species")

enter image description here

2
votes

Default matplotlib setting are not very aesthetic; however, do not underestimate the power of matplotlib.

The simplest solution to your problem might be:

plt.style.use('ggplot') # this is the trick

from sklearn import datasets

iris = datasets.load_iris()
X = iris.data
y = iris.target
df = pd.DataFrame(X, columns = iris.feature_names)

pd.scatter_matrix(df, c=y, figsize = [10,10], s=50);

enter image description here

(full list of styles available can be accessed via plt.style.available)

You may further customize the plot to your needs adjusting matplotlibrc file. An example of what could be done with it could be found here