1
votes

I am trying to use matlotlib to control the appearance of outliers in a "notched" box plot generated using seaborn. My code looks as follows:

ax = sns.boxplot(y= "class", x = "Proba",hue = "Stage", data = df_tidy, notch = True,
                 showmeans= True, meanprops={"marker": ".", "markerfacecolor":"red", "markeredgecolor": "red"},
                 flierprops = dict(markerfacecolor = '.1', markersize = .0018, linestyle = "none", markeredgecolor='steelblue'),
                 boxprops=dict(alpha=.7), width=.3)

However, I have a fairly large # of outliers that make the boxplot look a little unappealing aesthetically; specifically I am seeing a near continuous stream of outliers beyond the whiskers. Unfortunately, I am unable to generate fictitious data for this example as it requires one to have many outliers within an otherwise large dataset for this to happen.

I tried to "improve" this somewhat using an alternative color for the outliers and reduce their size, but it did not improve the result much. One option that worked modestly well was to set the "linestyle" argument within flierprops to "dotted".

However, is there a way to pass a "jitter" argument to flierprops dictionary? Can somebody suggest a way to make the outliers jitter?

1

1 Answers

1
votes

The first part of the code tries to create an MRE.

To have very small markers, one can set the linecolor to 'none' to make those lines around the markers completely invisible and use a marker of a pixel ','. A lot of markers near the same spot can be enhanced further using alpha. you might need to experiment with your data to find out the amount of alpha giving the best result.

After referring to this post and quite some experimenting, it looks like in this case we can get the lines belonging to the ax. Every seventh Line2D in the list seems to correspond to the outliers. We can extract their y-positions, add some jitter and replace them.

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

N = 100000
x = np.where(np.random.randint(20, size=N * 6) > 0,
             np.random.normal(np.repeat([30, 35], N * 3), 10, N * 6) * np.tile([1, 1.3, 1.1], N * 2),
             np.random.normal(80, 5, N * 6))
y = np.tile(['A', 'B', 'C'], N * 2)
hue = np.repeat(['X', 'Y'], N * 3)

ax = sns.boxplot(y=y, x=x, hue=hue, notch=True,
                 showmeans=True, meanprops={"marker": ".", "markerfacecolor": "red", "markeredgecolor": "red"},
                 flierprops=dict(marker=',', markerfacecolor='steelblue', markeredgecolor='none', alpha=.1),
                 boxprops=dict(alpha=.7), width=.3)

for line in ax.get_lines()[6::7]:
    # line.set_mec('purple') # to test that we have the correct Line2Ds
    yoffsets = line.get_ydata()
    line.set_ydata(yoffsets + np.random.uniform(-0.05, 0.05, yoffsets.size))

jittered outliers