0
votes

At this moment I have pandas Dataframe called mergeDf(40 rows x 2 columns) with column types float and categorical - see below.

NH01        float64
NH01cat    category
dtype: object

I am trying to built a side by side boxplot in seaborn that will have all values from column NH01 on the y-axis and they should be categorized based on their value in NH01cat column.My final data frame will consist of 42 columns where each two adjacent columns will be quantitative and categorical data as shown in the example (e.g. NH01 - float,NH01- categorical,NH02 -float, NH02cat- categorical and so on.).The final plot should consist of 21 pairs of boxplots based on each dataset of 2 neighboring columns)

      NH01    NH01cat
0   0.428581    NacZ
1   0.425339    NacZ
2   0.428772    NacZ
3   0.425148    NacZ
4   0.428581    NacZ
5   0.433540    NacZ
6   0.422096    NacX
7   0.423431    NacX
8   0.432205    NacX
9   0.431824    NacX
10  0.424194    NacX`

I am trying the below line of code:

ax=sns.boxplot(y=mergeDf['NH01'], hue="NH01cat",orient='v', data=mergeDf, linewidth=2.5)

but the result I am getting is a single boxplot.

How could I group the boxplot based on the available data?

Thanks

2

2 Answers

0
votes

hue= can only be used if you already have different level of x.

try:

ax=sns.boxplot(y='NH01', x="NH01cat",orient='v', data=mergeDf, linewidth=2.5)
0
votes

If I understood your question correctly now, you have a dataframe like so:

N = 100
M = 5
df = pd.DataFrame()
for i in range(1,M+1):
    df[f'NH{i:02d}'] = np.random.normal(loc=i, size=(N,))
    df[f'NH{i:02d}cat'] = np.random.choice(['NacZ','NacX'], size=(N,))
print(df.head())

output:

       NH01 NH01cat      NH02 NH02cat      NH03 NH03cat      NH04 NH04cat  \
0  0.231058    NacZ  1.872279    NacZ  4.048766    NacX  3.869479    NacZ   
1  0.062530    NacX  1.210339    NacZ  3.374466    NacZ  2.827855    NacX   
2  1.146168    NacX  0.752690    NacZ  3.948877    NacZ  4.320780    NacZ   
3  0.266700    NacZ  0.874896    NacX  1.529101    NacX  3.448940    NacZ   
4  1.620292    NacX  0.689638    NacX  2.778528    NacX  4.590301    NacZ   

       NH05 NH05cat  
0  3.757337    NacX  
1  4.552330    NacZ  
2  5.188367    NacX  
3  5.067367    NacZ  
4  4.108142    NacZ  

that you would like to plot with hue-nested boxplots.

To do that, you have to transform your data from "wide" to "long". There may be more efficient way to do this (maybe a separate question for pandas expert), but you can use pd.wide_to_long() as long as you change the column names slightly:

import re
df2 = df.copy()
df2.columns = [re.sub('NH([0-9]{2})cat','cat-NH\\1',c) for c in df2.columns]
df2.columns = [re.sub('^NH([0-9]{2})$','val-NH\\1',c) for c in df2.columns]
df2['id'] = df.index
df2 = pd.wide_to_long(df2, ['val','cat'], j='NH', i='id', sep='-', suffix='NH\\d+')
df2 = df2.reset_index()

now df2 looks like this:

   id    NH       val   cat
0   0  NH01  0.231058  NacZ
1   1  NH01  0.062530  NacX
2   2  NH01  1.146168  NacX
3   3  NH01  0.266700  NacZ
4   4  NH01  1.620292  NacX

which you can plot:

sns.boxplot(y="val",x="NH",hue='cat', data=df2)

enter image description here