0
votes

EDIT: Removed old question to make it easier to find solution in post.

Seaborn is a Python data visualization library based on matplotlib. To use Seaborn, your data need to be in the tidy format.

You can use Pandas DataFrame.loc[] to filter a dataframe.

In the following example I will (1) load some data from an CSV file into a dataframe, (2) filter that data based on specific values in a column, (3) present that data in a boxplot using Seaborn, and (4) decide the order in which the data is presented and what labels should be used.

Some example data

Object,Metric,Score
M11,B2A10,2.7939033333333336
MT1,B2A10,1.287634388888889
MT1,B2A1,7.1535
MT1,B2A2,2.2441833333333334
MT1,B2A3,3.3787333333333334
MT1,B2A4,2.50297
MT1,B2A5,1.4254989999999998
MT1,B2A6,2.91325
MT1,B2A7,1.24806
MT1,B2A8,2.08797725
MT1,B2A9,1.208722

Import libraries and modules

import pandas as pd
import seaborn as sns

Set seaborn style

sns.set(style="whitegrid", palette="colorblind")

Load data and generate a list of items we want to filter

data = pd.read_csv("data.csv") 

list = ["B2A10", "B2A1"]

Filter data using .loc and place into new dataframe

filtered_data = data.loc[data['Metric'].isin(list)]

Generate a boxplot using the filtered data

fig, ax = plt.subplots(figsize=(10,6))
ax = sns.boxplot(x='Metric', y='Length', data=samples, order=["B2A1", "B2A10"])
ax = sns.swarmplot(x="Metric", y="Length", data=samples, color=".25", order=["B2A1", "B2A10"])
ax.set_xlabel('Label X-Axis')
ax.set_ylabel('Label Y-Axis')
plt.title('Title',fontsize=16)
labels = [item.get_text() for item in ax.get_xticklabels()]
labels[0] = 'Sample 1'
labels[1] = 'Sample 2'
ax.set_xticklabels(labels)
plt.savefig('test.png', dpi=300, bbox_inches='tight')

The final graph should look like this.

enter image description here

2

2 Answers

0
votes

I will answer both yur question in the following.

1) To sort the values in the plot (for instance sorting by Score) you can first sort Metric (which you are using as order parameter for the boxplot) by 'Score' doing the following:

sorted_order = [x for _, x in sorted(zip(score_tidy.Score, score_tidy.Metric))]

Then pass order=sorted_order to you boxplot call.

2) To change the xticks labels of any matplotlib-based plot (e.g. those generated through seaborn) you can get the handle of your axes (taking the current one in the following through plt.gca()) and do this:

plt.gca().set_xticks(np.arange(0, len(sorted_order)), sorted_order)

This way you will get your xticks labels from Metrics, sorted according to Score.

0
votes

I found solutions to both problems.

(1) The first solution is to simply filters the items I want from the original data and save as a new data file. I think this is suboptimal but works for my purposes. I think it would be much more efficient to filter items directly based on what is presented in the graph.

# Load list data

data = pd.read_csv('data.csv')

# Define items

items = ['item1', 'item2', 'item3']

# Filter items using Pandas isin() function

items_1_3 = data[data.Column.isin(items)]
items_1_3.to_csv('data_1_3.csv', index=False)

(2) I found a way to change the labels of the graph based on their position (remember Python starts to count with 0, not 1). I add this code directly after where I have defined my graph.

labels = [item.get_text() for item in ax.get_xticklabels()] 
labels[0] = 'one' 
labels[1] = 'two' 
labels[2] = 'three' 
ax.set_xticklabels(labels)

Hope someone will find this useful.