1
votes

I am creating a series of boxplots in order to compare different cancer types with each other (based on 5 categories). For plotting I use seaborn/matplotlib. It works fine for most of the cancer types (see image right) however in some the x axis collapses slightly (see image left) or strongly (see image middle) https://i.imgur.com/dxLR4B4.png

Looking into the code how seaborn plots a box/violin plot https://github.com/mwaskom/seaborn/blob/36964d7ffba3683de2117d25f224f8ebef015298/seaborn/categorical.py (line 961)

violin_data = remove_na(group_data[hue_mask])

I realized that this happens when there are too many nans

Is there any possibility to prevent this collapsing by code only I do not want to modify my dataframe (replace the nans by zero)

Below you find my code:

boxp_df=pd.read_csv(pf_in,sep="\t",skip_blank_lines=False)
fig, ax = plt.subplots(figsize=(10, 10))
sns.violinplot(data=boxp_df, ax=ax)
plt.xticks(rotation=-45)
plt.ylabel("label")
plt.tight_layout()
plt.savefig(pf_out)

The output is a per cancer type differently sized plot (depending on if there is any category completely nan) I am expecting each plot to be in the same width.

Update trying to use the order parameter as suggested leads to the following output: https://i.imgur.com/uSm13Qw.png

Maybe this toy example helps ?

|Cat1|Cat2|Cat3|Cat4|Cat5
|3.93|    |0.52|    |6.01
|3.34|    |0.89|    |2.89
|3.39|    |1.96|    |4.63
|1.59|    |3.66|    |3.75
|2.73|    |0.39|    |2.87
|0.08|    |1.25|    |-0.27

Update Apparently, the problem is not the data but the length of the title https://github.com/matplotlib/matplotlib/issues/4413

Therefore I would close the question @Diziet should I delete it or does my issue might help other ones? Sorry for not including the line below in the code example:

ax.set_title("VERY LONG TITLE", fontsize=20)
1
I'm not entirely clear how your code could have generated the figure you show at the begining. According to your code, you should always get a 10x10 figure, regardless of the content of your dataframe(s)Diziet Asahi
Ah good catch this might be confusing for others as well I screenshotted the two plots and uploaded them as one figure eliminating as much white space as possible I am going to upload another oneIvo Leist
Your toy dataset and code does not reproduce the issue. Please review Minimal, Complete, and Verifiable exampleDiziet Asahi
@Diziet was trying to reproduce the issue in the toy dataset as well...there I realized that the issue is not the data but the plot title (see update). Anyway thank you for pushing me to provide a toy exampleIvo Leist

1 Answers

0
votes

It's hard to be sure without data to test it with, but I think you can pass the names of your categories/cancers to the order= parameter. This forces seaborn to use/display those, even if they are empty.

for instance:

tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips, order=['Thur','Fri','Sat','Freedom Day','Sun','Durin\'s Day'])

enter image description here