1
votes

I want to plot grouped boxplots with seaborn but the data is present in two different DataFrame objects.

The dataframes have identical rows and columns with different values and have shape: (10.000 x 24). The columns are cancer types and the rows are genes in this case.

When I plot only one of the dataframes, everything looks nice. Following the docs I joined the two dataframes like:

df1 = pd.read_csv('a.csv')
df2 = pd.read_csv('b.csv')
# categorical variable similar to 'smoker' attribute in tips dataset
df1['kind'] = 'catA'
df2['kind'] = 'catB'
both = pd.concat((df1, df2))

When I now plot the data, everything looks good:

seaborn.boxplot(data=both)

gives me one beautiful boxplotenter image description here

However, I am not able to set the hue, x and y attributes correctly to get the same plot as a grouped boxplot. I know that it should be something similar to:

seaborn.boxplot(x=?, y=?, hue='kind', data=both)

but I can't figure out how to set x and y to get the same behaviour as if they were set to None.

Thanks for any help or suggestions.

Best, Roman

1

1 Answers

1
votes

Seems that your dataframe is in the 'wide' format. You'll need to convert it to the 'long' format (functions such as pandas.melt or pandas.wide_to_long should help). You'll have to organize your data so that it'd end being a N x 3 dataframe with the columns being 1. your gene expression measurement, 2. the cancer types, and 3. your new categorical variable (similar to 'smoker'). You can have more than 3 columns if there's another variable that you have that you wish to have (such as gene name). But these three need to be present for the plot to work.

(I may have misinterpreted the content of your data, but this is my understanding of what you are measuring and what the variables are.)

Then your command would look something like:

seaborn.boxplot(x='measurement', y='cancer_type', hue='kind', data=both)