2
votes

I have searched many ways of making histograms centered around tick marks but not able to find a solution that works with seaborn displot. The function displot lets me stack the histogram according to a column in the dataframe and thus would prefer a solution using displot or something that allows stacking based on a column in a data frame with color-coding as with palette.

Even after setting the tick values, I am not able to get the bars to center around the tick marks.

Example code

# Center the histogram on the tick marks 
tips = sns.load_dataset('tips')
sns.displot(x="total_bill",
                hue="day", multiple = 'stack', data=tips)
plt.xticks(np.arange(0, 50, 5))


I would also like to plot a histogram of a variable that takes a single value and choose the bin width of the resulting histogram in such a way that it is centered around the value. (0.5 in this example.)

I can get the center point by choosing the number of bins equal to a number of tick marks but the resulting bar is very thin. How can I increase the bin size in this case, where there is only one bar but want to display all the other possible points. By displaying all the tick marks, the bar width is very tiny. I want the same centering of the bar at the 0.5 tick mark but make it wider as it is the only value for which counts are displayed. Any solutions?

tips['single'] = 0.5
sns.displot(x='single',
                hue="day", multiple = 'stack', data=tips, bins = 10)
plt.xticks(np.arange(0, 1, 0.1))

Edit: Would it be possible to have more control over the tick marks in the second case? I would not want to display the round off to 1 decimal place but chose which of the tick marks to display. Is it possible to display just one value in the tick mark and have it centered around that?

Does the min_val and max_val in this case refer to value of the variable which will be 0 in this case and then the x axis would be plotted on negative values even when there are none and dont want to display them.

1
Ticks marks are just a list of values that are within the axis' bounds. You can construct them as you wish and they will appear at the appropriate positions. Please see my my edited answer for an explanation.skuzzy
The min and max values were carryovers from previous example. Essentially you have to provide the range over which bins should be calculated. Additionally, I think that what you really are trying to plot is a categorical bar graph. Is that the case?skuzzy
@skuzzy Thanks a lot for the explanation. I was not trying to plot a categorical value but it helps me see the logic I can use when I want to represent limited values on the x-axis.Anusha

1 Answers

3
votes

For your first problem, you may want to figure out a few properties of the data that your plotting. For example the range of the data. Additionally, you may want to choose beforehand the number of bins that you want displayed.

tips = sns.load_dataset('tips')
min_val = tips.total_bill.min()
max_val = tips.total_bill.max()
val_width = max_val - min_val
n_bins = 10
bin_width = val_width/n_bins

sns.histplot(x="total_bill",
                hue="day", multiple = 'stack', data=tips,
                bins=n_bins, binrange=(min_val, max_val),
                palette='Paired')
plt.xlim(0, 55) # Define x-axis limits

Another thing to remember is that width a of a bar in a histogram identifies the bounds of its range. So a bar spanning [2,5] on the x-axis implies that the values represented by that bar belong to that range.

Considering this, it is easy to formulate a solution. Assume that we want the original bar graphs - identifying the bounds of each bar graph, one solution may look like

plt.xticks(np.arange(min_val-bin_width, max_val+bin_width, bin_width))

Bounded bars

Now, if we offset the ticks by half a bin-width, we will get to the centers of the bars.

plt.xticks(np.arange(min_val-bin_width/2, max_val+bin_width/2, bin_width))

Centered Ticks - Paired

For your single value plot, the idea remains the same. Control the bin_width and the x-axis range and ticks. Bin-width has to be controlled explicitly since automatic inference of bin-width will probably be 1 unit wide which on the plot will have no thickness. Histogram bars always indicate a range - even though when we have just one single value. This is illustrated in the following example and figure.

single_val = 23.5
tips['single'] = single_val
bin_width = 4

fig, axs = plt.subplots(1, 2, sharey=True, figsize=(12,4)) # Get 2 subplots 

# Case 1 - With the single value as x-tick label on subplot 0
sns.histplot(x='single',
                hue="day", multiple = 'stack', data=tips, 
                binwidth=bin_width, binrange=(single_val-bin_width, single_val+bin_width),
                palette='rocket',
                ax=axs[0])
ticks = [single_val, single_val+bin_width] # 2 ticks - given value and given_value + width
axs[0].set(
    title='Given value as tick-label starts the bin on x-axis',
    xticks=ticks,
    xlim=(0, int(single_val*2)+bin_width)) # x-range such that bar is at middle of x-axis
axs[0].xaxis.set_major_formatter(FormatStrFormatter('%.1f'))

# Case 2 - With centering on the bin starting at single-value on subplot 1
sns.histplot(x='single',
                hue="day", multiple = 'stack', data=tips, 
                binwidth=bin_width, binrange=(single_val-bin_width, single_val+bin_width),
                palette='rocket',
                ax=axs[1])

ticks = [single_val+bin_width/2] # Just the bin center
axs[1].set(
    title='Bin centre is offset from single_value by bin_width/2',
    xticks=ticks,
    xlim=(0, int(single_val*2)+bin_width) ) # x-range such that bar is at middle of x-axis
axs[1].xaxis.set_major_formatter(FormatStrFormatter('%.1f'))

Output:

Single-value chart

I feel from your description that what you are really implying by a bar graph is a categorical bar graph. The centering is then automatic. Because the bar is not a range anymore but a discrete category. For the numeric and continuous nature of the variable in the example data, I would not recommend such an approach. Pandas provides for plotting categorical bar plots. See here. For our example, one way to do this is as follows:

n_colors = len(tips['day'].unique()) # Get number of uniques categories
agg_df = tips[['single', 'day']].groupby(['day']).agg(
    val_count=('single', 'count'),
    val=('single','max')
).reset_index() # Get aggregated information along the categories
agg_df.pivot(columns='day', values='val_count', index='val').plot.bar(
    stacked=True,
    color=sns.color_palette("Paired", n_colors), # Choose "number of days" colors from palette
    width=0.05 # Set bar width
    ) 
plt.show()

This yields:

pandas categorical plot